From: Andrew on
Hello all,

I have a 2D scatter plot of 500-2000 points. I have clustered the data based on their location and nearness to other points in the plot. What I would like to be able to do is to select several clusters on the plot and combine them into a single cluster. My main issue has been finding a way to select a number of clusters from the graph. The brush tool seemed like the first obvious approach. However, I did not know how to get the brush tool to automatically return which clusters had been highlighted so I could combine them. I tried to open the brush tool and rewrite some code in it to save the selected points to a variable once the mouse button was released, but I have been having difficulties with that approach. I was just curious if anyone had any better ideas about how I can allow the user to pic several clusters from the plot and return them as variables to workspace so the
program can combine them into a single cluster.

Thanks!
From: Richard Willey on
Hi Andrew

You might want to take a look at the dataset array in Statistics Toolbox.

The dataset array isn't an absolute requirement to solve your problem,
however, I find that structuring your information makes further analysis
much easier.

This example doesn't use brushing, however, I'd argue that the command line
implementation is at least as easy...

% Generate some fake clusters using mvrnd


MU1 = [1 2];

SIGMA1 = [2 0; 0 .5];

cluster1 = mvnrnd(MU1,SIGMA1,100);


MU2 = [-3 -5];

SIGMA2 = [1 0; 0 1];

cluster2 = mvnrnd(MU2,SIGMA2,100);


MU3 = [4 -5];

SIGMA3 = [.5 0; 0 1.5];

cluster3 = mvnrnd(MU3,SIGMA3,100);


MU4 = [-2 -9];

SIGMA4 = [.1 0; 0 .3];

cluster4 = mvnrnd(MU4,SIGMA4,100);


data = vertcat(cluster1, cluster2, cluster3, cluster4);


X_Data = data(:,1);

Y_Data = data(:,2);


Grouping_Var = ones(100,1);

Grouping_Var(101:200) = 2 * ones(100,1);

Grouping_Var(201:300) = 3 * ones(100,1);

Grouping_Var(301:400) = 4 * ones(100,1);


% Store this all in a dataset array

ds = dataset(X_Data, Y_Data, Grouping_Var);

ds.Grouping_Var = nominal(ds.Grouping_Var);


%%

% Use a group scatter plot to visualize my data

gscatter(ds.X_Data, ds.Y_Data, ds.Grouping_Var);


%%

% Combine clusters 4 + 2

ds.Grouping_Var(ds.Grouping_Var == '4') = '2';


%%

% Use a group scatter plot to visualize my data

gscatter(ds.X_Data, ds.Y_Data, ds.Grouping_Var);