From: Tom Lane on
> I have looked into the example of 'fisheriris' mentioned in matlab help.
> There in the cell array 'species' contains only the names of the species
> of flowers. Now my difficulty is how does the classregtree command
> classifies those flowers into 3 different species using just a matrix
> 'meas' containing measurement data and a cell array containing only the
> name of species. Doesn't this command requires some extra deciding
> parameters to classify those flowers?
> Now if i collect the same measurement data(i.e sepal length, sepal
> width,petal length and petal width), of a new flower whose class is not
> known then how can i identify the class of this flower using this CART
> algorithm?

The algorithm is going to come up with a series of rules, such as

1 if PL<2.45 then node 2 else node 3
2 class = setosa
3 if PW<1.75 then node 4 else node 5
and so on

such that the class assignments in the input data match the known classes
well. Then it's going to apply those same rules to your new data and figure
out what the tree predicts for its classes.

You ask about extra parameters. There are parameters that you can specify to
control, for example, the size of the tree. But there aren't a lot of the
typical assumptions made here. This is in contrast with linear discriminant
analysis, where there are assumptions about a normal distribution and there
are estimates of means and covariances.

If you need details, the book "Classification and Regression Trees" by
Breiman et al. provides them.

-- Tom


From: Mohiyuddin on
"Tom Lane" <tlane(a)mathworks.com> wrote in message <hra4lm$47p$1(a)fred.mathworks.com>...
> > I have looked into the example of 'fisheriris' mentioned in matlab help.
> > There in the cell array 'species' contains only the names of the species
> > of flowers. Now my difficulty is how does the classregtree command
> > classifies those flowers into 3 different species using just a matrix
> > 'meas' containing measurement data and a cell array containing only the
> > name of species. Doesn't this command requires some extra deciding
> > parameters to classify those flowers?
> > Now if i collect the same measurement data(i.e sepal length, sepal
> > width,petal length and petal width), of a new flower whose class is not
> > known then how can i identify the class of this flower using this CART
> > algorithm?
>
> The algorithm is going to come up with a series of rules, such as
>
> 1 if PL<2.45 then node 2 else node 3
> 2 class = setosa
> 3 if PW<1.75 then node 4 else node 5
> and so on
>
> such that the class assignments in the input data match the known classes
> well. Then it's going to apply those same rules to your new data and figure
> out what the tree predicts for its classes.
>
> You ask about extra parameters. There are parameters that you can specify to
> control, for example, the size of the tree. But there aren't a lot of the
> typical assumptions made here. This is in contrast with linear discriminant
> analysis, where there are assumptions about a normal distribution and there
> are estimates of means and covariances.
>
> If you need details, the book "Classification and Regression Trees" by
> Breiman et al. provides them.
>
> -- Tom
>


>thank you once again Tom. I have almost understood what this CART does. My problem is i am unable to understand how does the algorithm generates the rules such as
> 1 if PL<2.45 then node 2 else node 3
> 2 class = setosa
> 3 if PW<1.75 then node 4 else node 5
> and so on
Does it creates these rules on its own or is it that we have to specify them(i.e the values 2.45 & 1.75). I was talking about such parameters(the values 2.45 & 1.75), have we to specify them or the algorithm decides them on its own?
>I searched that book by Breiman in the net but couldn't find it. Can u specify me any website where i can download this book for free.
>> Also i would like to have another great help from you that is, I want to know the "command" which is used to classify the new data once a classification tree is created using data with known classes. Please help me out cos i am running out of time.
>
>----Mohiyuddin
From: Tom Lane on
>> 1 if PL<2.45 then node 2 else node 3
>> 2 class = setosa
>> 3 if PW<1.75 then node 4 else node 5
>> and so on
> Does it creates these rules on its own or is it that we have to specify
> them(i.e the values 2.45 & 1.75). I was talking about such parameters(the
> values 2.45 & 1.75), have we to specify them or the algorithm decides them
> on its own?

You definitely don't provide these numbers. They're calculated by the
algorithm to do things like find the best separation between groups at each
step.

>>I searched that book by Breiman in the net but couldn't find it. Can u
>>specify me any website where i can download this book for free.

I doubt you will find this book for free. If you do a Google search for
"classification and regression trees" you may find some information, though.

>>> Also i would like to have another great help from you that is, I want to
>>> know the "command" which is used to classify the new data once a
>>> classification tree is created using data with known classes. Please
>>> help me out cos i am running out of time.

help classregtree/eval

-- Tom


From: Mohiyuddin on
"Tom Lane" <tlane(a)mathworks.com> wrote in message <hrcnrd$c3s$1(a)fred.mathworks.com>...
> >> 1 if PL<2.45 then node 2 else node 3
> >> 2 class = setosa
> >> 3 if PW<1.75 then node 4 else node 5
> >> and so on
> > Does it creates these rules on its own or is it that we have to specify
> > them(i.e the values 2.45 & 1.75). I was talking about such parameters(the
> > values 2.45 & 1.75), have we to specify them or the algorithm decides them
> > on its own?
>
> You definitely don't provide these numbers. They're calculated by the
> algorithm to do things like find the best separation between groups at each
> step.
>
> >>I searched that book by Breiman in the net but couldn't find it. Can u
> >>specify me any website where i can download this book for free.
>
> I doubt you will find this book for free. If you do a Google search for
> "classification and regression trees" you may find some information, though.
>
> >>> Also i would like to have another great help from you that is, I want to
> >>> know the "command" which is used to classify the new data once a
> >>> classification tree is created using data with known classes. Please
> >>> help me out cos i am running out of time.
>
> help classregtree/eval
>
> -- Tom
>

thank you again Tom. U have cleared all my doubts regarding CART. But i am facing another problem with my project. Now, once i submit a new data(a vector) to a classification tree using eval command, i get only the class name of that data, but i also want the probability of this classification. I want the value of probability by which the data belongs to a class. Is there any command which calculates this probability value.
Also i have to set a threshold value 't' to this probability 'p' so as to satisfy the following condition for a variable x
x=1 if p>t else x=0.
But i don't know how to chose this threshold value for my data. Can u please help me out.
--- Mohiyuddin