Discriminant analysis [Matlab]

Prev: Problem fitting svensson model by using lsqcurvefit
Next: IMAQ getsnapshot

From: Greg Heath on 28 Apr 2010 07:44

On Apr 27, 9:26 pm, "John G" <a...(a)yahoo.com> wrote:
> Peter Perkins <Peter.Perk...(a)MathRemoveThisWorks.com> wrote in message <hr81mf$nn...(a)fred.mathworks.com>...
> > On 4/27/2010 8:04 PM, John G wrote:
> > > The LDA built into the stats toolbox appears to assume covariances equal
> > > & classes distributed normally, unlike Fisher LDA,
>
> > That _is_ Fisher LDA. How do you define it? If you want _unequal_ cov
> > matrices, that's quadratic discriminant analysis.
>
> I guess I was wrong then. I thought Fisher's LDA was a bit different (Wikipedia says it doesn't necessarily make the same assumptions as regular LDA).
>
> How do you implement the MatLab LDA then?
>
> [C,err,P,logp,coeff] = classify(sample,training,group,'linear')
>
> but what would you use for group and training? The example is kind of unclear. I'm uncertain what a training data set is - is it a particular subset of the m x m array you're working with or can it be generalized to something else or what?

In general, the total data set is partitioned into three subsets

training design data used to directly determine the weights,
given training parameters (e.g., the % of data used
for each of the subsets and/or the prior
probability
weighting and misclassification costs for each
class)

validation nontraining design data repetitively used to estimate
predictive performance so that training parameters
can be optimized.

test nondesign data used once and only once to obtain
an unbiased estimate of predictive performance on
unseen data.

If you wish to retrain because the test set performance is
significantly
worse than validation set performance, you should repartition the
data
to try to keep the new test results as unbiased as possible.
Typically,
I find that when this happens 10-fold crossvalidation is a better
alternative.

Hope this helps.

Greg

From: Peter Perkins on 28 Apr 2010 09:06

On 4/27/2010 9:26 PM, John G wrote:

> I guess I was wrong then. I thought Fisher's LDA was a bit different
> (Wikipedia says it doesn't necessarily make the same assumptions as
> regular LDA).

John, I guess I fall into the "The terms Fisher's linear discriminant
and LDA are often used interchangeably" crowd that Wikipedia describes.
But I think it's kind of like the "least squares vs. maximum
likelihood for the normal distribution" distinction: you end up in the
same place, but from two different justifications.

By the way, the references that I looked at disagree with the Wikipedia
article in that they use a pooled cov estimate for "Fisher's LDA". I
was unaware that "Fisher's original article actually describes a
slightly different discriminant, which does not make some of the
assumptions of LDA such as ... equal class covariances". I take it on
faith that the author of that statement read the paper, because I
haven't. Unequal covariances lead to QDA if you look at likelihood ratios.

You might consider using a classification tree if you're concerned about
the assumptions of LDA.

> How do you implement the MatLab LDA then?
>
> [C,err,P,logp,coeff] = classify(sample,training,group,'linear')
>
> but what would you use for group and training? The example is kind of
> unclear. I'm uncertain what a training data set is - is it a particular
> subset of the m x m array you're working with or can it be generalized
> to something else or what?

The training data are a set of observations for which you know both the
predictor variables _and the class in which each observation falls_.
Let's say you were trying to classify whether or not a new patient has a
disease. You have records of 100 patients who are known to have it or
not, as well as various other pieces of information about them. All of
that is your training data. When you get a new patient, all you know is
the "other information" and you try to predict in advance whether they
will fall into the "disease" class or the "no disease" class.

From: Ting Su on 28 Apr 2010 11:14

John,

By Fisher LDA, you probably mean a feature extraction (dimension reduction)
methods which tries to find a projection maximizing the class separability.
The linear discriminant analysis function in MATLAB's statistics toolbox is
intended for classification (no projection is involved). Both methods are
referred as LDA in the literature and have a common assumption, but they
are different methods. It looks like Fisher's LDA is usually used to refer
the feature extraction method.

-Ting

"John G" <asdf(a)yahoo.com> wrote in message
news:hr7gbg$8gv$1(a)fred.mathworks.com...
> I've got an m x m array. I want to apply Fisher discriminant analysis to
> it - the LDA in MatLab's stats toolbox isn't the Fisher one so I used the
> version provided by the supplementary toolbox stprtool package.
> http://cmp.felk.cvut.cz/cmp/software/stprtool/index.html
>
> How do I run my program? I don't really understand the input:
>
> data [struct] Binary labeled training vectors.
> .X [dim x num_data] Training vectors.
> .y [1 x num_data] Labels (1 or 2)
>
>
> Also, I'm not really sure I understand the concept of a training data set.
> Does it have to be a subset of the array you want to analyze or a general
> data set?

From: Ting Su on 28 Apr 2010 16:09

John,
It turns out the function "classify" in the MATLAB's statistics toolbox
returns the coefficients describing the boundary between
the regions separating each pair of groups as the fifth ouput, so this
fucntion can be used to find the projection maximizing the class
separability. This projection is the hyperplane orthogonal to the decision
boundary.

-Ting

"Ting Su" <Ting.Su(a)mathworks.com> wrote in message
news:hr9jd1$j8e$1(a)fred.mathworks.com...
> John,
>
> By Fisher LDA, you probably mean a feature extraction (dimension
> reduction) methods which tries to find a projection maximizing the class
> separability. The linear discriminant analysis function in MATLAB's
> statistics toolbox is intended for classification (no projection is
> involved). Both methods are referred as LDA in the literature and have a
> common assumption, but they are different methods. It looks like Fisher's
> LDA is usually used to refer the feature extraction method.
>
>
>
>
> -Ting
>
> "John G" <asdf(a)yahoo.com> wrote in message
> news:hr7gbg$8gv$1(a)fred.mathworks.com...
>> I've got an m x m array. I want to apply Fisher discriminant analysis to
>> it - the LDA in MatLab's stats toolbox isn't the Fisher one so I used the
>> version provided by the supplementary toolbox stprtool package.
>> http://cmp.felk.cvut.cz/cmp/software/stprtool/index.html
>>
>> How do I run my program? I don't really understand the input:
>>
>> data [struct] Binary labeled training vectors.
>> .X [dim x num_data] Training vectors.
>> .y [1 x num_data] Labels (1 or 2)
>>
>>
>> Also, I'm not really sure I understand the concept of a training data
>> set. Does it have to be a subset of the array you want to analyze or a
>> general data set?
>
>

First | Prev |
Pages: 1 2
Prev: Problem fitting svensson model by using lsqcurvefit
Next: IMAQ getsnapshot