From: Tom Lane on
> Thanks alot Tom for helping me to clear the doubts. By the way, i still
> have question to ask you. I have try several other data for this logistic
> regression model. However, i noticed when the data need alot of iterations
> before converging, the final value for the coefficients seems to be wrong.
> I knew that the values are wrong because i tried to use the same data
> using 2 other logistic regression packages. My friend told me this is
> because of underfitting and overfitting concept. If that so, where should
> i adjust or modify in order to get the same value for the same data in
> matlab.

Sorry, you're just not giving me enough information here. If you want to
e-mail me some data and your glmfit call, feel free to do that.

-- Tom


From: Saari Saari on
"Tom Lane" <tlane(a)mathworks.com> wrote in message <hi4rnn$2oh$1(a)fred.mathworks.com>...
> > Thanks alot Tom for helping me to clear the doubts. By the way, i still
> > have question to ask you. I have try several other data for this logistic
> > regression model. However, i noticed when the data need alot of iterations
> > before converging, the final value for the coefficients seems to be wrong.
> > I knew that the values are wrong because i tried to use the same data
> > using 2 other logistic regression packages. My friend told me this is
> > because of underfitting and overfitting concept. If that so, where should
> > i adjust or modify in order to get the same value for the same data in
> > matlab.
>
> Sorry, you're just not giving me enough information here. If you want to
> e-mail me some data and your glmfit call, feel free to do that.
>



> -- Tom
>

Dear Tom,
I'm sorry for not enough information. Here is example of my data which look quite simple

x=[1 2 2 2 3 1 1 1 1 1;1 2 2 1 2 2 1 1 2 1]';
>> y=[0 1 1 0 0 1 1 0 1 0]';
>> b=glmfit(x,[y ones(10,1)],'binomial','link','logit')
Warning: error.
> In glmfit at 355

b =

1.0e+003 *

-0.4504
-0.9319
1.3812

If you can see, there some error "in glmfit 355" which make the value for the coefficients are wrong. The correct value for coefficients (using other logistic calculation package) are
Coefficients and Standard Errors...
Variable Coeff. StdErr p
1 -38.875612233.5680 0.9975
2 59.475617671.0729 0.9973
Intercept -21.6986

I just realised this after knowing this data required alot of iterations before it converged. If i used previous data which has less iteration, the coefficients are correct. I dont know what is the problem here since it give the wrong value for coefficients.
I hope you can help me to slove this and if there any modification towards code, i hope you will inform me. Thanks
From: Tom Lane on
> If you can see, there some error "in glmfit 355" which make the value for
> the coefficients are wrong. The correct value for coefficients (using
> other logistic calculation package) are
> Coefficients and Standard Errors...
> Variable Coeff. StdErr p
> 1 -38.875612233.5680 0.9975
> 2 59.475617671.0729 0.9973
> Intercept -21.6986
>
> I just realised this after knowing this data required alot of iterations
> before it converged. If i used previous data which has less iteration, the
> coefficients are correct. I dont know what is the problem here since it
> give the wrong value for coefficients.

Both are wrong. The mathematically correct, perfect answer is the limit as
the coefficient vector goes to infinity in a certain way, keeping the ratios
between coefficients constant in a certain way.

Take a look at the following. Your data are nearly separable, meaning there
is a linear combination of X columns with all y=0 on one side and all y=1 on
the other, except at a single spot where the combination yields a
non-degenerate proportion of y=1.

You can't go too far wrong, though, with either set of coefficients. They
produce very similar results.

-- Tom

%% The data, and estimated coefficients from MATLAB and elsewhere
x=[1 2 2 2 3 1 1 1 1 1;1 2 2 1 2 2 1 1 2 1]';
y=[0 1 1 0 0 1 1 0 1 0]';
b=glmfit(x,[y ones(10,1)],'binomial','link','logit')
b1 = [-21.6986 -38.8756 59.4756]';

%% Both give nearly identical fits, and perfect except obs 1,7,8,10
row = (1:length(y))';
yhat = glmval(b,x,'logit');
yhat1 = glmval(b1,x,'logit');
plot(row,y,'ro',row,yhat,'b-',row,yhat1,'m:')

%% Compute the linear combination of X columns for each fit
subplot(2,1,1);
xb = b(1) + x*b(2:3);
[sxb,t] = sort(xb);
plot(sxb,y(t),'ro',sxb,yhat(t),'b-')

subplot(2,1,2);
xb1 = b1(1) + x*b1(2:3);
[sxb,t] = sort(xb1);
plot(sxb,y(t),'ro',sxb,yhat1(t),'b-')


From: Saari Saari on
"Tom Lane" <tlane(a)mathworks.com> wrote in message <hi7ffm$4n5$1(a)fred.mathworks.com>...
> > If you can see, there some error "in glmfit 355" which make the value for
> > the coefficients are wrong. The correct value for coefficients (using
> > other logistic calculation package) are
> > Coefficients and Standard Errors...
> > Variable Coeff. StdErr p
> > 1 -38.875612233.5680 0.9975
> > 2 59.475617671.0729 0.9973
> > Intercept -21.6986
> >
> > I just realised this after knowing this data required alot of iterations
> > before it converged. If i used previous data which has less iteration, the
> > coefficients are correct. I dont know what is the problem here since it
> > give the wrong value for coefficients.
>
> Both are wrong. The mathematically correct, perfect answer is the limit as
> the coefficient vector goes to infinity in a certain way, keeping the ratios
> between coefficients constant in a certain way.
>
> Take a look at the following. Your data are nearly separable, meaning there
> is a linear combination of X columns with all y=0 on one side and all y=1 on
> the other, except at a single spot where the combination yields a
> non-degenerate proportion of y=1.
>
> You can't go too far wrong, though, with either set of coefficients. They
> produce very similar results.
>
> -- Tom
>
> %% The data, and estimated coefficients from MATLAB and elsewhere
> x=[1 2 2 2 3 1 1 1 1 1;1 2 2 1 2 2 1 1 2 1]';
> y=[0 1 1 0 0 1 1 0 1 0]';
> b=glmfit(x,[y ones(10,1)],'binomial','link','logit')
> b1 = [-21.6986 -38.8756 59.4756]';
>
> %% Both give nearly identical fits, and perfect except obs 1,7,8,10
> row = (1:length(y))';
> yhat = glmval(b,x,'logit');
> yhat1 = glmval(b1,x,'logit');
> plot(row,y,'ro',row,yhat,'b-',row,yhat1,'m:')
>
> %% Compute the linear combination of X columns for each fit
> subplot(2,1,1);
> xb = b(1) + x*b(2:3);
> [sxb,t] = sort(xb);
> plot(sxb,y(t),'ro',sxb,yhat(t),'b-')
>
> subplot(2,1,2);
> xb1 = b1(1) + x*b1(2:3);
> [sxb,t] = sort(xb1);
> plot(sxb,y(t),'ro',sxb,yhat1(t),'b-')
>


dear Tom,
i appreciated your explanation regarding the above matter. with the same data that i give to u earlier,
x=[1 2 2 2 3 1 1 1 1 1;1 2 2 1 2 2 1 1 2 1]';
> y=[0 1 1 0 0 1 1 0 1 0]';
> b=glmfit(x,[y ones(10,1)],'binomial','link','logit')

when i try again to run using matlab, it still did not produce the same result as how u give in 'b1'.
b1 = [-21.6986 -38.8756 59.4756]';
the same error come out mentioning iteration reach the limits. i really do not have any idea how u can produce the coefficients just like in b1. u told me that my data are nearly separable previously. but i still working in order to get the same coefficients like in b1 as how u claimed that u get it previously.do you have the same code glmfit for b1?i guest u used different code for b1, not
b=glmfit(x,[y ones(10,1)],'binomial','link','logit').
am i right?
i hope you can help me because i am not too good in statistical. i am sorry.
From: Tom Lane on
> when i try again to run using matlab, it still did not produce the same
> result as how u give in 'b1'.
> b1 = [-21.6986 -38.8756 59.4756]';
> the same error come out mentioning iteration reach the limits. i really do
> not have any idea how u can produce the coefficients just like in b1. u
> told me that my data are nearly separable previously. but i still working
> in order to get the same coefficients like in b1 as how u claimed that u
> get it previously.do you have the same code glmfit for b1?i guest u used
> different code for b1, not

Saari, the b1 value is what I got from your posting, not from a MATLAB
command.

Why is it that you need to get this value? It appears that MATLAB and the
other software simply performed a different number of iterations for this
singular problem where the true answer is infinite. In fact, it looks like
MATLAB performed more iterations and got a slightly better (higher, less
negative) value for the log likelihood function:

>> sum(log(binopdf(y,1,glmval(b,x,'logit')))) % from glmfit
ans =
-2.24934057847523
>> sum(log(binopdf(y,1,glmval(b1,x,'logit')))) % from other software
ans =
-2.24934058917271

Of course I just have a few digits of precision for b1, copied from your
posting.

-- Tom