From: Ardavan on 17 Apr 2010 17:35 Hi, Matlab's kmeans runs excruciatingly slow on large data sets (e.g. 128x10000). Is this normal or am I doing something wrong? [idx centers] = kmeans(discriptors' ,250); (disriptor is 128x10000) I tried increasing the number of centers in the cluster and also tried increasing the , MaxIter to 500, because kmeans wasn't converging for 100 max iterations.
From: Doug Schwarz on 17 Apr 2010 21:41 In article <hqd9i9$kdu$1(a)fred.mathworks.com>, "Ardavan " <rustpanjguh(a)gmail.com> wrote: > Hi, > > Matlab's kmeans runs excruciatingly slow on large data sets (e.g. 128x10000). > Is this normal or am I doing something wrong? > > [idx centers] = kmeans(discriptors' ,250); > > (disriptor is 128x10000) > > > I tried increasing the number of centers in the cluster and also tried > increasing the , MaxIter to 500, because kmeans wasn't converging for 100 max > iterations. It's not the size of the data set (10000 points in 128-dimensional space is quite modest), but the number of clusters you are requesting (250) that makes it take so long. -- Doug Schwarz dmschwarz&ieee,org Make obvious changes to get real email address.
From: Ardavan on 18 Apr 2010 10:32 Doug Schwarz <see(a)sig.for.address.edu> wrote in message <see-C7D421.21414117042010(a)news.frontiernet.net>... > In article <hqd9i9$kdu$1(a)fred.mathworks.com>, > "Ardavan " <rustpanjguh(a)gmail.com> wrote: > > > Hi, > > > > Matlab's kmeans runs excruciatingly slow on large data sets (e.g. 128x10000). > > Is this normal or am I doing something wrong? > > > > [idx centers] = kmeans(discriptors' ,250); > > > > (disriptor is 128x10000) > > > > > > I tried increasing the number of centers in the cluster and also tried > > increasing the , MaxIter to 500, because kmeans wasn't converging for 100 max > > iterations. > > > It's not the size of the data set (10000 points in 128-dimensional space > is quite modest), but the number of clusters you are requesting (250) > that makes it take so long. > > -- > Doug Schwarz > dmschwarz&ieee,org > Make obvious changes to get real email address. Hi Doug, Thanks for responding to my question. To get a reasonable response time from kmeans, What is the number of clusters you'd choose for a 10000 points of 128 dimensions data set? Thanks
From: Doug Schwarz on 18 Apr 2010 13:28 In article <hqf55k$mb6$1(a)fred.mathworks.com>, "Ardavan " <rustpanjguh(a)gmail.com> wrote: > Doug Schwarz <see(a)sig.for.address.edu> wrote in message > <see-C7D421.21414117042010(a)news.frontiernet.net>... > > In article <hqd9i9$kdu$1(a)fred.mathworks.com>, > > "Ardavan " <rustpanjguh(a)gmail.com> wrote: > > > Matlab's kmeans runs excruciatingly slow on large data sets (e.g. > > > 128x10000). > > > Is this normal or am I doing something wrong? > > > > > > [idx centers] = kmeans(discriptors' ,250); [snip] > > It's not the size of the data set (10000 points in 128-dimensional space > > is quite modest), but the number of clusters you are requesting (250) > > that makes it take so long. > > Hi Doug, > > Thanks for responding to my question. To get a reasonable response time from > kmeans, What is the number of clusters you'd choose for a 10000 points of 128 > dimensions data set? I have only ever run kmeans for a small number of clusters, say less than about 20. I have noticed that it takes longer to run with k = 10 than k = 3 so I can only imagine how long it would take if k = 250. The correct value depends on how *you* want to cluster the data. It is a weakness of kmeans that you have to specify the number of clusters so if you're not sure about that then you have to run it with different values of k and pick the result you like best. Since I have no idea what your data looks like, what it represents or what you are trying to do I can't really help to pick k, but it is easy to try small values and see what happens. -- Doug Schwarz dmschwarz&ieee,org Make obvious changes to get real email address.
|
Pages: 1 Prev: finding root with bisection method Next: fmincon with interior point - honoring bounds |