Minimum Cluster Distance for RBF? [Matlab]

Prev: programmatic HC clustering suggestion?
Next: Up/ down-sample

From: Greg Heath on 29 Mar 2010 11:56

Mar 29, 2010 09:27:24 AM, jorgejaviergutierrez(a)gmail.com wrote:

> Hello Greg, again me. I'm seeing your responses on RBF posts
> and I have a question about the MINDST(Minimum distance
> between clusters of different classes). How calculate it?
> Using dist (Matlab) I have the following example (part of
> the real data series):
>
>P = [44,20,37,12,70,16; 20,37,12,70,16,29;...
> 37,12,70,16,29,22; 12,70,16,29,22,0];
>T = [70,16,29,22,0,28];

If this is a classification problem it is not well specified.

1. How many classes do you have?
2. How many vectors in each class?
3. The columns of T should be columns of the unit matrix
with the "1" corresponding to the class of the input vector.

>dist(P,P')
>ans =
>
> 0 89.1011 54.1756 82.5470
> 89.1011 0 86.0930 61.0492
> 54.1756 86.0930 0 87.2181
> 82.5470 61.0492 87.2181 0

P contains 6 4-dimensional vectors. The 36 distances
between the 6 vectors is given by

dist(P',P)

What you have calculated are the 16 distances between 4
6-dimensional vectors.

>min = dist(P,P')
>ans =
> 0 0 0 0
>
> or 54.1756???

What do you think? ... The diagonal contains the distances
between each vector and itself.

>Thank in advance
>Jorge

>P.S.: regression problem; length(X)=159; delay vector [4 3 2 1];
>preprocess prestd or premnmx; PCA prepca; RBF newrb.

If this is a regression problem why are you considering
clusters and classes?

In most classification problems the vectors of each class
tend to cluster together. For example, in 2-D you may have
4 clusters of data grouped around the cluster centers
(-1,-2), (-1,2), (2,1), (-2,2). Centers 1 and 3 may belong to
class1, center 2 to class 2 and center 4 to class 3.

Either the clusters are obvious from the specification of the
problem or you have to use a clustering function like KMEANS
to cluster the data. You can cluster all of the data together
and see which clusters appear to belong to which classes
or you can cluster each class separately. In the latter
case clusters from different classes may intersect which
may indicate that you need to specify more clusters or it
may mean that the classes cannot be separated without
error.

What does the following have to do with this problem?

Greg

On 12 sep 2007, 15:41, Greg Heath <he...(a)alumni.brown.edu> wrote:
> On Sep 12, 6:16 am, Agata <agata.wyleg...(a)gmail.com> wrote:
>
> > On 10 Wrz, 20:26, Greg Heath <he...(a)alumni.brown.edu> wrote:
>
> > > On Sep 10, 1:40 pm, Agata <agata.wyleg...(a)gmail.com> wrote:
> -----SNIP
> > > > > > ... I don't think that performance of the network is correct.
> > > > > > Obviously I'm doing something wrong.
> > > > > > Here's the rest od my code:
>
> > > > > > [trainV, valV, testV] = dividevec(P, T, 0.10, 0.10);
> > > > > Can't comment on this I don't have dividevec in my older
> > > > > version
>
> Current comment:
>
> 0.10* 72 < 9 ==> all classes cannot be represented in the val and
> test sets.
>
> In other words:
>
> Your data set of 72 too small for practical random sampling with
> ratios 0.8/0.1/0.1 from the class mixture. The result will be
> severely
> unbalanced training.
>
> Better to use stratified sampling to keep each design relatively
> balanced.
>
> Since you only have 8 vectors each in each of 9 classes, take 1
> vector
> for validation and 1 for testing from each class . That leaves 6
> vectors
> for training. There are 8*7 = 56 ways to do this for each class.
>
> Therefore there are 56^9 ways to obtain train/val/tst splits of
> 54/9/9.
>
> I would try M designs and average the MSEs. A practical size for M
> can be determined by increasing it untill the running average of the
> MSEs is stabilized, i.e., the standard deviation of MSE is
> sufficiently
> low.
>
> > > > > > net = newff(minmax(P),[10 9],{'tansig','logsig'});
>
> > > > > You have only run H = 10 for one weight initialization.
> > > > > You have to find H by trial and error. For each value of H
> > > > > that you seriously consider you need to run many (or at least
> > > > > several) weight initializations.
>
> > > > Hmm.. I understeand that H is the number of hidden nodes. This is not
> > > > by any chance the same that numer of neurons in the hidden layer?
>
> > > Yes.
>
> > I tried my network with H =5,10,15,20,30 and 40. When I tried 50 nodes
> > MATLAB said "Out of Memory".
>
> Do you immediately clear variables that will not be used again (e.g.,
> matrices
> of dimension [5000 72], stuff left over from H = 40, etc.
>
> > So, network's performance is way better (MSE is smaller) when the
> > number of H nodes is higher.
>
> What are you measuring this by? Training errors must be expected
> to be biased low. Therefore you have to rely on unbiased val and test
> errors or reduce the bias in the training set error via
>
> MSEtrn = SSEtrn/(Ntrn-Nw)
>
> where
>
> Nw = (120+1)*H + (H+1)*9 = 10 + 130*H
>
> UH-OH!
>
> Nw > Ntrn for H > 1.
>
> Therefore you will have severe cases of overfitting and must use
> the val and tst estimates.
>
> In fact, you don't even have enough data for a good linear classifier
> (H = 0).
>
> Have you, as recommended in my pretraining advice post, designed
> a linear classifier? If so, what are MSErr and PCTerr?
>
> Have you tried reducing the number of inputs via STEPWISE?
>
> > But the learning time takes forever with 40 nodes.
>
> Yeah, Nw =10+130*40 = 7210
>
> > I wish I could check the network performance for more nodes.
> > Maybe I'll have a chance at my University. Do you think performance
> > could be much better?
>
> You won't be able to tell unless you average over M (>=20) designs.
>
> However, since your system of learning equations is so severely
> underdetermined, it may be better to design 9 separate single-class
> classifiers AND try to reduce the input dimensionality of 120. After
> all,
> 54 training vectors only define a 53-D space.
>
> Again, do the linear classifier design 1st.
>
> Aso, you may do better with a cluster based algorithm like RBF or
> LVQ.
>
> > > -----SNIP
>
> > > > > > This is my output:
>
> > > > > > out =
>
> -----SNIP
>
> > >... this is only 7 inputs out of 72.
>
> > I divided my data set, because I didn't have enough data. I used 80%
> > percent of data set for training, 10% for validation and 10% for
> > testing, that's why there's only 7 inputs.
>
> Stratified sampling and averaging over multiple designs should help.
>
> > I will gather more samples and I'll use them for my experiments cause
> > I think what I have is too > little.
>
> I know you have too little!
>
> > >What is your MSE and classification error rate?
>
> > For the following input CLASSES=[1 1 3 6 6 7 8],
> > And the outputs is:
> > out =
>
> > 0.1355 0.0480 0.1002 0.1160 0.0946 0.0449 0.0465
> > 0.0461 0.4316 0.1123 0.1101 0.1389 0.2679 0.1670
> > 0.2987 0.0095 0.1621 0.1314 0.2518 0.0144 0.1404
> > 0.0064 0.0696 0.4548 0.4399 0.5802 0.0231 0.3315
> > 0.0865 0.7397 0.0481 0.0699 0.0413 0.6681 0.1321
> > 0.3488 0.0133 0.1893 0.2497 0.2011 0.0151 0.1326
> > 0.3217 0.6081 0.2059 0.0921 0.1265 0.4667 0.1950
> > 0.0368 0.1137 0.2315 0.2268 0.3569 0.1217 0.2162
> > 0.0034 0.2003 0.2129 0.1779 0.2671 0.0816 0.3447
>
> > So ASSIGNED CLASSES=[6 5 4 4 4 5 9]
>
> > Corresponding targets:
> > ans =
>
> > 1 1 0 0 0 0 0
> > 0 0 0 0 0 0 0
> > 0 0 1 0 0 0 0
> > 0 0 0 0 0 0 0
> > 0 0 0 0 0 0 0
> > 0 0 0 1 1 0 0
> > 0 0 0 0 0 1 0
> > 0 0 0 0 0 0 1
> > 0 0 0 0 0 0 0
>
> > When I changed the goal to 0.01 it was:
> > TRAINLM-calcjx, Epoch 0/1000, MSE 0.56629/0.01, Gradient
> > 0.482982/1e-010
> > TRAINLM-calcjx, Epoch 11/1000, MSE 0.199231/0.01, Gradient
> > 1.42214e-005/1e-010
> > TRAINLM, Validation stop.
>
> > But when I change it back to default value the performance is:
> > TRAINLM-calcjx, Epoch 0/1000, MSE 0.285233/0, Gradient 0.346827/1e-010
> > TRAINLM-calcjx, Epoch 8/1000, MSE 0.0592876/0, Gradient
> > 0.0489449/1e-010
> > TRAINLM, Validation stop.
>
> > I think in my case classification error rate is 100% right now.
>
> Check the linear classifier.
>
> > > > What do you think about using other networks to classification? Cause
> > > > I have to use 3 architectures. I was thinking about pnn and lvq, cause
> > > > I did some reading and it seams like these are the best choices.
>
> > > I would use NEWRB instead of PNN. See my posts on RBFNN training.
>
> > I studied you're posts on RBFNN, and I have a question:
> > I have to find optimal spread value for me using loop. You wrote:
>

> > SPREAD0 = mean(MNDST)/3
>
> > MINDST = Minimum distance between clusters of different classes
> > Can you give me a little tip what should MINDST be, it should be
> > distance between my cases? So distance between my feature vectors? How
> > to count that?
>
> Since your data set is so small, you can obtain the distances between
> each
> pair of data points.
>
> help dist
>
> On larger data sets you can use KMEANS on each class and look a
> at distances between cluster means.
>
> > One more question about LVQ:
> > I don't quite understand in following line net =
> > newlvq(PR,S1,PC,LR,LF) what is PC -element vector of typical > > class percentages.
> > I found:
> > PC = (1/9)*ones(1,9); (in my case), so i'll have
> > pp =
>
> > 0.1111 0.1111 0.1111 0.1111 0.1111 0.1111
> > 0.1111 0.1111 0.1111
> > Matlab says it's not good.
>
> They don't add up to 100%
>
> > Well I don't quite understand what is PC.
>
> > The last question. Do my data have to be normalized as well for newrb
> > and newlvq (for example should I perform [P,ps]=mapminmax(Pr)?)
>
> No. However by doing so you will avoid a lot of problems in the long
> run.
>
> I prefer zero-meann/unit-variance standardization.
>
> Hope this helps.
>
> Greg

|
Pages: 1
Prev: programmatic HC clustering suggestion?
Next: Up/ down-sample