From: Gian Piero Bandieramonte on
> My advice is equivalent to adding a linear first hidden layer with
> weight matrix T determined from training data. Then, freezing the
> net
> before testing.
>
> Your method is equivalent to changing the first hidden layer(
> only!)
> every time a new set of data is to be classified.
>
> Which sounds more reasonable to you?

I actually didn't completely understand this advice, but I sense it's
more reasonable to use your advice than the way I did it, because it
seems more efficient and less time consuming. But if I talk in terms
of correctness, would the way I did it is acceptable?
From: Greg Heath on

Gian Piero Bandieramonte wrote:
Greg Heath wrote:
> > My advice is equivalent to adding a linear first hidden layer with
> > weight matrix T determined from training data. Then, freezing the
> > net before testing.
> >
> > Your method is equivalent to changing the first hidden layer
> > (only!) every time a new set of data is to be classified.
> >
> > Which sounds more reasonable to you?
>
> I actually didn't completely understand this advice, but I sense it's
> more reasonable to use your advice than the way I did it, because it
> seems more efficient and less time consuming.

And it effectively removes the test set from providing
classification parameters.

> But if I talk in terms of correctness, would the way I did it is
> acceptable?

Good question.

Two modeling/classification assumptions are:
1. Design(training+validation) and test sets are random draws
from the same stationary population probability distribution.
2. The design set is large enough to get accurate estimates on
all weights and thresholds.

If

a. Assumption 1 is correct.
b. All test sets are sufficiently large enough to get accurate PCA
estimates.

then your method is acceptable.

However, frequently test sets are not that large and you would
have to defend the charge that your results are optimistically
biased because you are effectively testing on training data.

Don't forget that, in general, dominant PCA variable subset
selection may be inappropriate for classification.

Hope this helps.

Greg

From: sangdonlee on
What Greg meant is that the first PC accounts the largest amount of
variance in the data, which is X-axis in the graph below, for example
(use the fixed courier font to see). However for classification
purpose, the second PC (Y-axis) is the better discriminator.

Greg wrote:
>>Don't forget that, in general, dominant PCA variable subset
>>selection may be inappropriate for classification.

Sangdon Lee

|
#########|#########
|
<-----------+---------->
|
#########|#########
|



Gian Piero Bandieramonte wrote:
> > My advice is equivalent to adding a linear first hidden layer with
> > weight matrix T determined from training data. Then, freezing the
> > net
> > before testing.
> >
> > Your method is equivalent to changing the first hidden layer(
> > only!)
> > every time a new set of data is to be classified.
> >
> > Which sounds more reasonable to you?
>
> I actually didn't completely understand this advice, but I sense it's
> more reasonable to use your advice than the way I did it, because it
> seems more efficient and less time consuming. But if I talk in terms
> of correctness, would the way I did it is acceptable?

From: Greg Heath on

sangdonlee(a)hotmail.com wrote:
> What Greg meant is that the first PC accounts the largest amount of
> variance in the data, which is X-axis in the graph below, for example
> (use the fixed courier font to see). However for classification
> purpose, the second PC (Y-axis) is the better discriminator.
>
> Greg wrote:
> >>Don't forget that, in general, dominant PCA variable subset
> >>selection may be inappropriate for classification.
>
> Sangdon Lee
>
> |
> #########|#########
> |
> <-----------+---------->
> |
> #########|#########
> |
--------------------------SNIP

My example was 3-D. Therefore your diagram is, with scaling,
valid for projections into both the x-z and y-z planes. Dominant
PCA chooses x and y and rejects z when what you are looking
for is just the opposite.

Or, are you just presenting a simpler 2-D example?

Hope this helps.

Greg

From: sangdonlee on
Dear Greg,

Yes, I'm just presenting a simpler 2-D example for illustration
purpose.

As you already mentioned, discriminant analysis (DA) would be a better
choise for classification purpose and PCA can be useful to "understand"
the data before DA in general.

Sangdon

First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: A bug in avifile?
Next: Parallel port