function out=CV(X,Y,robust,hmax); %Calculates CV error rate for k=1,2,...,hmax components; %More info: for each deleted observation,i, T and R are recalculated and %and LD classification rule obtained from T and labels is used to %assign ith observation of data matrix X to one of the classes. %input: %X: data matrix(if centered, this is equivalent to csimpls by Hubert) %Y: class labels (smallest class label should be 0!) %robust: 1: RoCPLS ; 0: SIMPLS %hmax: Maximum number of components %output: % out.total:CV misclassification rate for LD; % out.k: the optimal number of k that provides the smallest % misclassification error rate %Created by: Nedret Billor & Asuman Turkmen %Updated on 04/01/2009 L=[X Y]; [nl p]=size(X); yhat=[]; for i=1:nl; fprintf('observation number: % d \n',i) A=L; A(i,:)=[]; yh=[]; for h=1:hmax; if robust==0 [R,P,T]=simpls(A(:,1:p)-repmat(mean(A(:,1:p)),nl-1,1),A(:,p+1),h); a=lda(T,A(:,p+1)+1,(L(i,1:p)-mean(A(:,1:p)))*R,L(i,p+1)+1); else outRS=RoCPLS(A(:,1:p)-repmat(mean(A(:,1:p)),nl-1,1),A(:,p+1),h,0); a=lda(outRS.T,outRS.yc+1,(L(i,1:p)-mean(A(:,1:p)))*(outRS.R),L(i,p+1)+1); end; yh=[yh a.fit-1]; end; yhat=[yhat;yh]; end; diff=[]; for h=1:hmax; diff(h)=nl-sum((Y-yhat(:,h))==0); end; out.total=diff./nl; [me index]=sort(out.total); out.k=index(1); %other required functions: function out=lda(x,y,xv,yv) % lda computes misclassification error rate for linear discriminant % analysis. Program requires at least two observations in each group % input: % x : Training data set % y : Group labels (>0) % xv : Test data % yv : Test data labels % Note: xv=x, yv=y give resubstution error rates %output: % out.total: Total misclassification %Created by: Nedret Billor & Asuman Turkmen %Updated on 04/01/2009 %Reading the data; [n p]=size(x); c=tabulate(y); m=size(c,1); n1=size(xv,1); yfit=[]; pr=c(:,2)./n; for i=1:n1 yfit(i,1)=rule(x,y,(xv(i,:))'); end; if n1==1; out.fit=yfit; else for i=1:m class(i) = sum((yv == yfit) & (yv == repmat(i,n1,1))); end; out.total=(n1-sum(class))/n1; end; %-------------------------------------------------------------------------- function result = rule(x,y,new) % rule stands for linear discriminant analysis % Program requires at least two observations in each group % Input: % % x : Training data set % y : Group labels (>0) % new : New observation % % output: % result: Group label for new observation obtained from lda [n1,p1] = size(x); t=tabulate(y); m1=size(t,1); pr1=t(:,2)./n1; LD=[]; sigma=repmat(0,p1,p1); for i=1:m1; sigma=sigma+(size(x(y==i,:),1)-1)*cov(x(y==i,:)); end; sigma=sigma./(n1-m1); for i=1:m1; LD(i)=((mean(x(y==i,:)))*inv(sigma)*new)-0.5*(mean(x(y==i,:)))*inv(sigma)*(mean(x(y==i,:)))'+log(pr1(i,1)); end; [L index]=sort(LD); result=index(:,m1);