From: Ardavan on
Hi,

Matlab's kmeans runs excruciatingly slow on large data sets (e.g. 128x10000). Is this normal or am I doing something wrong?

[idx centers] = kmeans(discriptors' ,250);

(disriptor is 128x10000)


I tried increasing the number of centers in the cluster and also tried increasing the , MaxIter to 500, because kmeans wasn't converging for 100 max iterations.
From: Doug Schwarz on
In article <hqd9i9$kdu$1(a)fred.mathworks.com>,
"Ardavan " <rustpanjguh(a)gmail.com> wrote:

> Hi,
>
> Matlab's kmeans runs excruciatingly slow on large data sets (e.g. 128x10000).
> Is this normal or am I doing something wrong?
>
> [idx centers] = kmeans(discriptors' ,250);
>
> (disriptor is 128x10000)
>
>
> I tried increasing the number of centers in the cluster and also tried
> increasing the , MaxIter to 500, because kmeans wasn't converging for 100 max
> iterations.


It's not the size of the data set (10000 points in 128-dimensional space
is quite modest), but the number of clusters you are requesting (250)
that makes it take so long.

--
Doug Schwarz
dmschwarz&ieee,org
Make obvious changes to get real email address.
From: Ardavan on
Doug Schwarz <see(a)sig.for.address.edu> wrote in message <see-C7D421.21414117042010(a)news.frontiernet.net>...
> In article <hqd9i9$kdu$1(a)fred.mathworks.com>,
> "Ardavan " <rustpanjguh(a)gmail.com> wrote:
>
> > Hi,
> >
> > Matlab's kmeans runs excruciatingly slow on large data sets (e.g. 128x10000).
> > Is this normal or am I doing something wrong?
> >
> > [idx centers] = kmeans(discriptors' ,250);
> >
> > (disriptor is 128x10000)
> >
> >
> > I tried increasing the number of centers in the cluster and also tried
> > increasing the , MaxIter to 500, because kmeans wasn't converging for 100 max
> > iterations.
>
>
> It's not the size of the data set (10000 points in 128-dimensional space
> is quite modest), but the number of clusters you are requesting (250)
> that makes it take so long.
>
> --
> Doug Schwarz
> dmschwarz&ieee,org
> Make obvious changes to get real email address.

Hi Doug,

Thanks for responding to my question. To get a reasonable response time from kmeans, What is the number of clusters you'd choose for a 10000 points of 128 dimensions data set?

Thanks
From: Doug Schwarz on
In article <hqf55k$mb6$1(a)fred.mathworks.com>,
"Ardavan " <rustpanjguh(a)gmail.com> wrote:

> Doug Schwarz <see(a)sig.for.address.edu> wrote in message
> <see-C7D421.21414117042010(a)news.frontiernet.net>...
> > In article <hqd9i9$kdu$1(a)fred.mathworks.com>,
> > "Ardavan " <rustpanjguh(a)gmail.com> wrote:
> > > Matlab's kmeans runs excruciatingly slow on large data sets (e.g.
> > > 128x10000).
> > > Is this normal or am I doing something wrong?
> > >
> > > [idx centers] = kmeans(discriptors' ,250);

[snip]

> > It's not the size of the data set (10000 points in 128-dimensional space
> > is quite modest), but the number of clusters you are requesting (250)
> > that makes it take so long.
>
> Hi Doug,
>
> Thanks for responding to my question. To get a reasonable response time from
> kmeans, What is the number of clusters you'd choose for a 10000 points of 128
> dimensions data set?

I have only ever run kmeans for a small number of clusters, say less
than about 20. I have noticed that it takes longer to run with k = 10
than k = 3 so I can only imagine how long it would take if k = 250.

The correct value depends on how *you* want to cluster the data. It is
a weakness of kmeans that you have to specify the number of clusters so
if you're not sure about that then you have to run it with different
values of k and pick the result you like best.

Since I have no idea what your data looks like, what it represents or
what you are trying to do I can't really help to pick k, but it is easy
to try small values and see what happens.

--
Doug Schwarz
dmschwarz&ieee,org
Make obvious changes to get real email address.