Prev: matlab in windows 7
Next: Delaying a signal
From: soundslikedrew Bhargava on 28 Jun 2010 12:22 I have a medium-large (500K rows * 50 columns) matrix of binary data (0-1 entries only). The matrix is mostly sparse: about 8% is filled. I would like to cluster it robustly - as in if I run the clustering again, I should get mostly the same result each time. (Obviously the cluster id lables themselves can differ each time) I'd be happy with number of clusters being between - say 4-8. What are my choices? 1. Obviously hierarchical clustering will not be able to handle the size here. 2. K-means - I've tried to use a) hamming b) sqeuclidean c) correlation d) cosine - all these with limited success. Obviously I get 'some' clustering, but if I do it more than 1 time, I do not get similar results. 3. I've also tried to create the SVD, reduce the number of columns and then do a K-means on the reduced 'U*S' matrix. Here too, I tried correlation, cosine and square Euclidean as metrics. Again, no robustness/lack of consistency in clusterings. Any ideas for me? Thanks DB
|
Pages: 1 Prev: matlab in windows 7 Next: Delaying a signal |