Prev: A bug in avifile?
Next: Parallel port
From: Gian Piero Bandieramonte on 31 Aug 2006 10:22 > My advice is equivalent to adding a linear first hidden layer with > weight matrix T determined from training data. Then, freezing the > net > before testing. > > Your method is equivalent to changing the first hidden layer( > only!) > every time a new set of data is to be classified. > > Which sounds more reasonable to you? I actually didn't completely understand this advice, but I sense it's more reasonable to use your advice than the way I did it, because it seems more efficient and less time consuming. But if I talk in terms of correctness, would the way I did it is acceptable?
From: Greg Heath on 31 Aug 2006 20:50 Gian Piero Bandieramonte wrote: Greg Heath wrote: > > My advice is equivalent to adding a linear first hidden layer with > > weight matrix T determined from training data. Then, freezing the > > net before testing. > > > > Your method is equivalent to changing the first hidden layer > > (only!) every time a new set of data is to be classified. > > > > Which sounds more reasonable to you? > > I actually didn't completely understand this advice, but I sense it's > more reasonable to use your advice than the way I did it, because it > seems more efficient and less time consuming. And it effectively removes the test set from providing classification parameters. > But if I talk in terms of correctness, would the way I did it is > acceptable? Good question. Two modeling/classification assumptions are: 1. Design(training+validation) and test sets are random draws from the same stationary population probability distribution. 2. The design set is large enough to get accurate estimates on all weights and thresholds. If a. Assumption 1 is correct. b. All test sets are sufficiently large enough to get accurate PCA estimates. then your method is acceptable. However, frequently test sets are not that large and you would have to defend the charge that your results are optimistically biased because you are effectively testing on training data. Don't forget that, in general, dominant PCA variable subset selection may be inappropriate for classification. Hope this helps. Greg
From: sangdonlee on 1 Sep 2006 08:59 What Greg meant is that the first PC accounts the largest amount of variance in the data, which is X-axis in the graph below, for example (use the fixed courier font to see). However for classification purpose, the second PC (Y-axis) is the better discriminator. Greg wrote: >>Don't forget that, in general, dominant PCA variable subset >>selection may be inappropriate for classification. Sangdon Lee | #########|######### | <-----------+----------> | #########|######### | Gian Piero Bandieramonte wrote: > > My advice is equivalent to adding a linear first hidden layer with > > weight matrix T determined from training data. Then, freezing the > > net > > before testing. > > > > Your method is equivalent to changing the first hidden layer( > > only!) > > every time a new set of data is to be classified. > > > > Which sounds more reasonable to you? > > I actually didn't completely understand this advice, but I sense it's > more reasonable to use your advice than the way I did it, because it > seems more efficient and less time consuming. But if I talk in terms > of correctness, would the way I did it is acceptable?
From: Greg Heath on 1 Sep 2006 11:30 sangdonlee(a)hotmail.com wrote: > What Greg meant is that the first PC accounts the largest amount of > variance in the data, which is X-axis in the graph below, for example > (use the fixed courier font to see). However for classification > purpose, the second PC (Y-axis) is the better discriminator. > > Greg wrote: > >>Don't forget that, in general, dominant PCA variable subset > >>selection may be inappropriate for classification. > > Sangdon Lee > > | > #########|######### > | > <-----------+----------> > | > #########|######### > | --------------------------SNIP My example was 3-D. Therefore your diagram is, with scaling, valid for projections into both the x-z and y-z planes. Dominant PCA chooses x and y and rejects z when what you are looking for is just the opposite. Or, are you just presenting a simpler 2-D example? Hope this helps. Greg
From: sangdonlee on 1 Sep 2006 11:53
Dear Greg, Yes, I'm just presenting a simpler 2-D example for illustration purpose. As you already mentioned, discriminant analysis (DA) would be a better choise for classification purpose and PCA can be useful to "understand" the data before DA in general. Sangdon |