From: Fred on 4 May 2010 03:06 Hello, I'm quite new with Matlab. I'm having some problems with the speed of "pdist" function in matlab 2007. I'm trying to calculate between-class distances using this particular function but it takes a hell of a long time to do it. Here is the code I'm using: Inter=zeros(1,244978); D=1; for i=1:(size(PCA_Scores.data,1)) for n=i+1:(size(PCA_Scores.data,1)) if PCA_Scores(i,:).class{1}==PCA_Scores(n,:).class{1} else Inter(1,D)=pdist([PCA_Scores(i,:).data;PCA_Scores(n,:).data],'cityblock'); D=D+1; end end end PCA_Scores is a 785x10 matrix containing the PC scores calculated from 785 near-infrared spectra (785X3112 data matrix). PCA_Scores.class{1} contains the class index for each sample (785x1 vector). What I want to do is to calculate the distance between samples which aren't in the same class (since usually pdist calculates the between-distance of all samples in the data matrix). I've tried to preallocate the size of the final matrix, but speed doesn't seem to increase particularly. Could someone help me with this issue? Thanks for help Fred
From: Peter Perkins on 4 May 2010 09:12 On 5/4/2010 3:06 AM, Fred wrote: > Hello, > I'm quite new with Matlab. I'm having some problems with the speed of > "pdist" function in matlab 2007. > What I want to do is to calculate the distance between samples which > aren't in the same class (since usually pdist calculates the > between-distance of all samples in the data matrix). If you have access to R2010b, PDIST2 would be closer to what you need: >> help pdist2 PDIST2 Pairwise distance between two sets of observations. <http://www.mathworks.com/access/helpdesk/help/toolbox/stat /pdist2.html> You would still want two nested loops, but this time over classes, at not be over observations. > Inter=zeros(1,244978); > D=1; > for i=1:(size(PCA_Scores.data,1)) > for n=i+1:(size(PCA_Scores.data,1)) > if PCA_Scores(i,:).class{1}==PCA_Scores(n,:).class{1} > else > Inter(1,D)=pdist([PCA_Scores(i,:).data;PCA_Scores(n,:).data],'cityblock'); > D=D+1; > end > end > end > > PCA_Scores is a 785x10 matrix containing the PC scores calculated from > 785 near-infrared spectra (785X3112 data matrix). That can't be exactly true, since you're indexing into it as a structure. It's hard to tell, but it appears that you've stored all your scores in a structure array? That isn't going to be the best way to do this. What you probably want is one numeric matrix for the scores, one for the class. You've only got 785 observations, and so vortse a's suggestion (call PDIST once, then carve away the things you don't need) makes sense. Or something like this: for i = 1:nclasses-1 scoresi = scores(class==i,:) ni = sum(scoresi); for j = i+1:nclasses scoresj = scores(class==j,:); nj = sum(scoresj); D = squareform(pdist(scoresi,scoresj)); [something] = D(1:ni,ni+1:end); end end where scores is a 785x10 numeric matrix and classes is a 785x1 numeric vector. I've probably made mistakes, but that's the rough idea. If you used PDIST2, you wouldn't need the call to squareform or the subscripting to pull out the off-dag block. Hope this helps.
From: Peter Perkins on 4 May 2010 09:22 On 5/4/2010 9:12 AM, Peter Perkins wrote: > If you have access to R2010b, PDIST2 would be closer to what you need: Sorry, R2010a, not R2010b.
|
Pages: 1 Prev: Plotting of graph Next: problem with symbolic integration |