From: soundslikedrew Bhargava on
I have a medium-large (500K rows * 50 columns) matrix of binary data (0-1 entries only). The matrix is mostly sparse: about 8% is filled.

I would like to cluster it robustly - as in if I run the clustering again, I should get mostly the same result each time. (Obviously the cluster id lables themselves can differ each time)

I'd be happy with number of clusters being between - say 4-8.

What are my choices?

1. Obviously hierarchical clustering will not be able to handle the size here.
2. K-means - I've tried to use
a) hamming
b) sqeuclidean
c) correlation
d) cosine

- all these with limited success. Obviously I get 'some' clustering, but if I do it more than 1 time, I do not get similar results.

3. I've also tried to create the SVD, reduce the number of columns and then do a K-means on the reduced 'U*S' matrix. Here too, I tried correlation, cosine and square Euclidean as metrics. Again, no robustness/lack of consistency in clusterings.

Any ideas for me?

Thanks
DB
 | 
Pages: 1
Prev: matlab in windows 7
Next: Delaying a signal