From: Greg Heath on 22 Apr 2010 10:44 On Apr 22, 9:01 am, "Aadarsh " <imfromwa...(a)hotmail.com> wrote: > If i have a n unknown points and n unknown clusters, is there any algorithm which will try to group the points into the most appropriate groups. > > I could use K-Means clustering algorithm but with that you have to give the predicted number of groups. For my problem, I do not know how many groups there will be - it changes each with each new run of the algorithm. > For k-means you give the maximum possible number of groups. Some may be empty. Greg
From: Doug Schwarz on 22 Apr 2010 12:13 Greg Heath wrote: > On Apr 22, 9:01 am, "Aadarsh " <imfromwa...(a)hotmail.com> wrote: >> If i have a n unknown points and n unknown clusters, is there any algorithm which will try to group the points into the most appropriate groups. >> >> I could use K-Means clustering algorithm but with that you have to give the predicted number of groups. For my problem, I do not know how many groups there will be - it changes each with each new run of the algorithm. >> > > > For k-means you give the maximum possible number of groups. > Some may be empty. > > Greg It is possible for kmeans to return empty clusters, but it is not the normal behavior. Try this: s = randn(10000,2); idx = kmeans(s,2); gplotmatrix(s,[],idx) and you will see that kmeans has split the single cluster into two, as requested. As to the OP's original question: yes, but it's not an easy problem. I'm no expert, but I have recently implemented something called Superparamagnetic Clustering. It's too complicated to explain here, but, within limits, it is capable of determining the number of clusters. It is slow. If the range of number of clusters is small and your data is suitable well behaved then just run kmeans multiple times with different values of k and use the third and/or fourth outputs to pick the best one. -- Doug Schwarz dmschwarz&ieee,org Make obvious changes to get real email address.
From: Greg Heath on 22 Apr 2010 14:48 On Apr 22, 12:13 pm, Doug Schwarz <s...(a)sig.for.address.edu> wrote: > Greg Heath wrote: > > On Apr 22, 9:01 am, "Aadarsh " <imfromwa...(a)hotmail.com> wrote: > >> If i have a n unknown points and n unknown clusters, is there any algorithm which will try to group the points into the most appropriate groups. > > >> I could use K-Means clustering algorithm but with that you have to give the predicted number of groups. For my problem, I do not know how many groups there will be - it changes each with each new run of the algorithm. > > > For k-means you give the maximum possible number of groups. > > Some may be empty. > > > Greg > > It is possible for kmeans to return empty clusters, but it is not the > normal behavior. Try this: > > s = randn(10000,2); > idx = kmeans(s,2); > gplotmatrix(s,[],idx) > > and you will see that kmeans has split the single cluster into two, as > requested. .... and I have used MATLABs kmeans with empty cluster results. What is "normal" should be understood with a full realization of what can occur. Canned kmeans tends to be too restrictive With my own clustering codes (FORTRAN/decades ago) I introduced initialization options, some of which would encourage empty clusters. > As to the OP's original question: yes, but it's not an easy problem. > I'm no expert, but I have recently implemented something called > Superparamagnetic Clustering. It's too complicated to explain here, Is it anything like the popular SuperDuperparaelectric algorithm? > but, within limits, it is capable of determining the number of clusters. > It is slow. If the range of number of clusters is small and your data > is suitable well behaved then just run kmeans multiple times with > different values of k and use the third and/or fourth outputs to pick > the best one. Greg
From: Doug Schwarz on 22 Apr 2010 15:13 Greg Heath wrote: > On Apr 22, 12:13 pm, Doug Schwarz <s...(a)sig.for.address.edu> wrote: >> Greg Heath wrote: >> > On Apr 22, 9:01 am, "Aadarsh " <imfromwa...(a)hotmail.com> wrote: >> >> If i have a n unknown points and n unknown clusters, is there any algorithm which will try to group the points into the most appropriate groups. >> >> >> I could use K-Means clustering algorithm but with that you have to give the predicted number of groups. For my problem, I do not know how many groups there will be - it changes each with each new run of the algorithm. >> >> > For k-means you give the maximum possible number of groups. >> > Some may be empty. >> >> > Greg >> >> It is possible for kmeans to return empty clusters, but it is not the >> normal behavior. Try this: >> >> s = randn(10000,2); >> idx = kmeans(s,2); >> gplotmatrix(s,[],idx) >> >> and you will see that kmeans has split the single cluster into two, as >> requested. > > ... and I have used MATLABs kmeans with empty cluster results. And I agreed that that can happen, but you implied that one could simply enter the maximum number of clusters expected and kmeans will "do the right thing" and return the "true" number of clusters. That is not usually true and I illustrated it with a trivial example. >> As to the OP's original question: yes, but it's not an easy problem. >> I'm no expert, but I have recently implemented something called >> Superparamagnetic Clustering. It's too complicated to explain here, > > Is it anything like the popular SuperDuperparaelectric algorithm? Are you trying to discredit my opinion by poking fun at the name? Hey, I didn't invent it, I merely wrote an implementation. The algorithm is analogous to the way magnetic domains are formed in a ferrous material. When the temperature is low all the domains are oriented in the same direction (ferromagnetic phase -- one cluster); when the temperature is high the domains are randomly oriented (paramagnetic phase -- many clusters). The goal is the find the intermediate temperature that gives rise to the superparamagnetic phase at which points that are close change state together. Google it for more details. The paper I found to be easiest to understand is called "Data Clustering Using a Model Granular Magnet" by Blatt, Wiseman and Domany. -- Doug Schwarz dmschwarz&ieee,org Make obvious changes to get real email address.
From: Aadarsh on 22 Apr 2010 17:42
"Aadarsh " <imfromwales(a)hotmail.com> wrote in message <hqphal$40s$1(a)fred.mathworks.com>... > If i have a n unknown points and n unknown clusters, is there any algorithm which will try to group the points into the most appropriate groups. > > I could use K-Means clustering algorithm but with that you have to give the predicted number of groups. For my problem, I do not know how many groups there will be - it changes each with each new run of the algorithm. > > Thanks Thanks for the information, ive taken it on board and decided to go with matlab's built in kmeans(), and set the maximum number of clusters to be 1/10 the total number of points to be clustered. This seems to be the best way of doing it with the knwledge i have, although im sure (as someone mentioned earlier) to take an iterative approach and to try to find the best model. How can I do this? Below is my code: for i=1:clusters [idx, c, sumd,D] = kmeans(correct,i*6, 'distance','cityblock'); avDistance = mean(sumd); i=i+1; end where clusters is 1/10 the number of points to cluster. I have looked into looking at the distance between each point to its centroid (avDistance above), but this is bound to keep getting smaller the more clusters I get. How can i set a good tolerance or stopping creterion for this iterative approach? Can anyone suggest any better method of stopping? Thanks a lot :) |