From: Saari Saari on
Dear Tom,

I discover the dummyvar function in matlab. and i try to used it, i mean with the simple example without include glmfit or glmval yet. however, as for a beginner like me, i dont understand when u mentioned about

"....One thing to watch out for is that if you include dummy variables that are
indicators for every group value, then the set of them will be collinear
with the constant term. You can either omit one of them or use the
'constant' parameter of glmfit to omit the constant term....."

since my data is linear (in logistic where the power just 1), so what to do with constant term here?in matlab tutorial, they do show on quadratic model.i will be more clear if you can explain with simple example where in it you include the glimfit and glmval function, i mean u pass this dummyvar to glmfit and glmval because sory to say again, i cant view how to pass it to these two function. how the code and data will look like. hopefully u can help me.really appreciate it. thank you.
From: Tom Lane on
> since my data is linear (in logistic where the power just 1), so what to
> do with constant term here?in matlab tutorial, they do show on quadratic
> model.i will be more clear if you can explain with simple example where in
> it you include the glimfit and glmval function, i mean u pass this
> dummyvar to glmfit and glmval because sory to say again, i cant view how
> to pass it to these two function. how the code and data will look like.
> hopefully u can help me.really appreciate it. thank you.

The logistic model is

log(p/(1-p)) = b0 + b1*x1 + b2*x2 + ...

I'm talking about the right-hand-side of this equation. Suppose x1 and x2
represent dummy variables for two groups, and there are no other predictors.
Then the right-hand-side value is

b0 + b1 for group 1
b0 + b2 for group 2

There's no way to estimate all of these coefficients. For example, given any
coefficient values I could add 10 to b0 and subtract 10 from each of b1 and
b2 without changing these sums. So there's no unique way to define the
coefficients.

Here's an example. I get the same estimated proportions for each group
whether I just compute the proportions directly or I fit a logistic model:

>> load carsmall
>> heavy = (Weight>3000);
>> grpstats(heavy,Origin)'
ans =
0.6087 0.5000 0 0.1111 0.5000 0
>> b = glmfit(dummyvar(grp2idx(Origin)),heavy,'binomial','constant','off');
Warning: Iteration limit reached.
> In glmfit at 355
>> glmval(b,eye(6),'logit','constant','off')'
ans =
0.6087 0.5000 0.0000 0.1111 0.5000 0.0000

-- Tom


From: Saari Saari on
Dear Tom,

Thank you. Is that correct the way I interpret the way you present when heavy is a response group for any car which has weight more than 3000. then when u
include this code,

grpstats(heavy,Origin)'
> ans =
> 0.6087 0.5000 0 0.1111 0.5000 0

what does it mean? is that dummy variable comes from origin group and response is heavy? and then how come there are 5 coefficients for 2 groups of dummy? or they do have other meaning?






"Tom Lane" <tlane(a)mathworks.com> wrote in message <hmokpd$o7s$1(a)fred.mathworks.com>...
> > since my data is linear (in logistic where the power just 1), so what to
> > do with constant term here?in matlab tutorial, they do show on quadratic
> > model.i will be more clear if you can explain with simple example where in
> > it you include the glimfit and glmval function, i mean u pass this
> > dummyvar to glmfit and glmval because sory to say again, i cant view how
> > to pass it to these two function. how the code and data will look like.
> > hopefully u can help me.really appreciate it. thank you.
>
> The logistic model is
>
> log(p/(1-p)) = b0 + b1*x1 + b2*x2 + ...
>
> I'm talking about the right-hand-side of this equation. Suppose x1 and x2
> represent dummy variables for two groups, and there are no other predictors.
> Then the right-hand-side value is
>
> b0 + b1 for group 1
> b0 + b2 for group 2
>
> There's no way to estimate all of these coefficients. For example, given any
> coefficient values I could add 10 to b0 and subtract 10 from each of b1 and
> b2 without changing these sums. So there's no unique way to define the
> coefficients.
>
> Here's an example. I get the same estimated proportions for each group
> whether I just compute the proportions directly or I fit a logistic model:
>
> >> load carsmall
> >> heavy = (Weight>3000);
> >> grpstats(heavy,Origin)'
> ans =
> 0.6087 0.5000 0 0.1111 0.5000 0
> >> b = glmfit(dummyvar(grp2idx(Origin)),heavy,'binomial','constant','off');
> Warning: Iteration limit reached.
> > In glmfit at 355
> >> glmval(b,eye(6),'logit','constant','off')'
> ans =
> 0.6087 0.5000 0.0000 0.1111 0.5000 0.0000
>
> -- Tom
>
From: Tom Lane on
> Thank you. Is that correct the way I interpret the way you present when
> heavy is a response group for any car which has weight more than 3000.
> then when u include this code,
>
> grpstats(heavy,Origin)'
>> ans =
>> 0.6087 0.5000 0 0.1111 0.5000 0
>
> what does it mean? is that dummy variable comes from origin group and
> response is heavy? and then how come there are 5 coefficients for 2 groups
> of dummy? or they do have other meaning?

This isn't doing logistic regression. This just computes the sample
proportion of "heavy" cars for each of the six Origin groups. I was just
demonstrating that logistic regression on the dummy varibles for these
groups produces the same results. Later on in my post, when I do logistic
regression, then there are six groups (not two).

-- Tom