From: C T on 27 Apr 2010 15:40 How does matlab calculate the centroid when the distance is correlation? I tried to look at the code but it just too much information. For example, If I got data = [1,2,3,4,5;10,20,30,40,50;10,9,8,6,7;20,30,40,50,60;] rand('twister',1); [idx ctr]=kmeans(data,2,'distance','correlation'); I got: >> idx idx = 2 2 1 2 >> ctr ctr = 0.6325 0.3162 0 -0.6325 -0.3162 -0.6325 -0.3162 0 0.3162 0.6325 How did matlab calculate ctr? I tried to calculate each_row_in_cluster2 - mean_of_each_row_in_cluster2)/standardeviation_of_each_row_in_cluster2 but it's not exactly = ctr(2,:) Thanks
From: Peter Perkins on 28 Apr 2010 09:22 On 4/27/2010 3:40 PM, C T wrote: > How does matlab calculate the centroid when the distance is correlation? > I tried to look at the code but it just too much information. For correlation distance, the data are first normalized to have zero row mean and unit row variance (put them on the unit hypersphere). X = X - repmat(mean(X,2),1,p); Xnorm = sqrt(sum(X.^2, 2)); X = X ./ Xnorm(:,ones(1,p)); As for the centroids, they are not really defined as points, but rather as directions -- their magnitude is arbitrary. So given normalized data, it suffices to compute the centroid as coordinate-wise arithmetic means centroids(i,:) = sum(X(members,:),1) / counts(i); Note that in your example, each centroid has mean zero. Is the norm of the centroids important to you?
From: C T on 28 Apr 2010 11:56 Peter Perkins <Peter.Perkins(a)MathRemoveThisWorks.com> wrote in message <hr9cr7$p4b$1(a)fred.mathworks.com>... > On 4/27/2010 3:40 PM, C T wrote: > > How does matlab calculate the centroid when the distance is correlation? > > I tried to look at the code but it just too much information. > > For correlation distance, the data are first normalized to have zero row > mean and unit row variance (put them on the unit hypersphere). > > X = X - repmat(mean(X,2),1,p); > Xnorm = sqrt(sum(X.^2, 2)); > X = X ./ Xnorm(:,ones(1,p)); > > As for the centroids, they are not really defined as points, but rather > as directions -- their magnitude is arbitrary. So given normalized > data, it suffices to compute the centroid as coordinate-wise arithmetic > means > > centroids(i,:) = sum(X(members,:),1) / counts(i); > > Note that in your example, each centroid has mean zero. Is the norm of > the centroids important to you? Thank you! I guess I'm just curious on what Matlab did.
From: Peter Perkins on 28 Apr 2010 14:00 On 4/28/2010 11:56 AM, C T wrote: > I guess I'm just curious on what Matlab did. A bit of an explanation: K-Means the algorithm (as opposed to KMEANS the function) is supposed to minimize the sum of within-cluster point-to-centroid distances. And so for squared Euclidean distance, the centroid for each cluster is the element-wise arithmetic mean, for city block distance, it's the component-wise median. There are not a lot of distances for which the minimizer is easy to compute -- even for (unsquared) Euclidean distance, it's hard. It would seem kind of funny to choose a centroid that did not minimize that sum within its own cluster. So that's why KMEANS, unlike LINKAGE, only supports five distances.
|
Pages: 1 Prev: exclude zeros from plot Next: Socket Programming using C++ in Matlab |