Heckman Selection Model [SAS]

Prev: PROC EXPORT error "DBMS type EXCEL2000 not valid for export"
Next: Sample size estimation for Prescott's test

From: cathy on 30 Mar 2007 15:39

It seems that there are at least two ways of conducting Heckman
Selection Model in SAS. One way is to do it in two steps and in the
first step to calculate inverse mill's ratio and apply it in the
second step. And another way is to directly estimate two equations
with a correlation coefficient rho generated.

Does anyone know their differences and under what circumstances should
each be used?

Thanks a lot,
Cathy

From: shiling99 on 30 Mar 2007 16:44

On Mar 30, 3:39 pm, "cathy" <ly_...(a)hotmail.com> wrote:
> It seems that there are at least two ways of conducting Heckman
> Selection Model in SAS. One way is to do it in two steps and in the
> first step to calculate inverse mill's ratio and apply it in the
> second step. And another way is to directly estimate two equations
> with a correlation coefficient rho generated.
>
> Does anyone know their differences and under what circumstances should
> each be used?
>
> Thanks a lot,
> Cathy

Heckman shows that two step approach is equivalent to ML under
binromal assumption in large sample. Two step approach is much easy to
calculate back to later 1970s. That is alll I see the difference.

HTH

From: cathy on 31 Mar 2007 00:25

On Mar 30, 12:44 pm, shilin...(a)yahoo.com wrote:
> On Mar 30, 3:39 pm, "cathy" <ly_...(a)hotmail.com> wrote:
>
> > It seems that there are at least two ways of conducting Heckman
> > Selection Model in SAS. One way is to do it in two steps and in the
> > first step to calculate inverse mill's ratio and apply it in the
> > second step. And another way is to directly estimate two equations
> > with a correlation coefficient rho generated.
>
> > Does anyone know their differences and under what circumstances should
> > each be used?
>
> > Thanks a lot,
> > Cathy
>
> Heckman shows that two step approach is equivalent to ML under
> binromal assumption in large sample. Two step approach is much easy to
> calculate back to later 1970s. That is alll I see the difference.
>
> HTH

Do you mean that they are identical? How about the selection of
variables in the two equations. I tried to estimate two equations with
identical independent variables, but get a rho of 1. Is that because
they cannot identical?

Thanks,
Cathy

From: shiling99 on 2 Apr 2007 11:14

On Mar 31, 12:25 am, "cathy" <ly_...(a)hotmail.com> wrote:
> On Mar 30, 12:44 pm, shilin...(a)yahoo.com wrote:
>
>
>
>
>
> > On Mar 30, 3:39 pm, "cathy" <ly_...(a)hotmail.com> wrote:
>
> > > It seems that there are at least two ways of conducting Heckman
> > > Selection Model in SAS. One way is to do it in two steps and in the
> > > first step to calculate inverse mill's ratio and apply it in the
> > > second step. And another way is to directly estimate two equations
> > > with a correlation coefficient rho generated.
>
> > > Does anyone know their differences and under what circumstances should
> > > each be used?
>
> > > Thanks a lot,
> > > Cathy
>
> > Heckman shows that two step approach is equivalent to ML under
> > binromal assumption in large sample. Two step approach is much easy to
> > calculate back to later 1970s. That is alll I see the difference.
>
> > HTH
>
> Do you mean that they are identical? How about the selection of
> variables in the two equations. I tried to estimate two equations with
> identical independent variables, but get a rho of 1. Is that because
> they cannot identical?
>
> Thanks,
> Cathy- Hide quoted text -
>
> - Show quoted text -
> Do you mean that they are identical?
It means that the estimates will converge to the same limit/true
parameter in lager sample.

>How about the selection of variables in the two equations.

I am not sure what you refer to. Here is the standard hechman
incidental selection model set up,

Selection mechanism: z*=gamma*w+u
z=1 if z*>0; otherwise z=0;
regresion model: y=beta*x+e
obseved y if z=1
(u,e) binormal distributed.

>I tried to estimate two equations with
> identical independent variables, but get a rho of 1. Is that because
> they cannot identical?

The two functions are related through the u, and e which is assumed
binormal distributed. The rho is defined as between (u,e) not (w,x).
It should be fine if one has w and x are the same.

The is no problem in estimation of selection model but the regresion
model will have bia estimation problem because it only have a
subsample when z=1.

Here is an example in which x is set to the same as w.

HTH.

*create a bivariate normal distribution data with sigma1=sigma2=1;
%let rho=0.8;
%let size=10000;

data binormal;
rho=ρ
a1=sqrt((1+rho)/2);
a2=sqrt((1-rho)/2);
do i=1 to &size;
rd1=rannor(12390);
rd2=rannor(12390);
e1=a1*rd1+a2*rd2;
e2=a1*rd1-a2*rd2;
output;
end;
run;

*verify the sample data;
proc corr;
var e1 e2;
run;

data simu_data;
set binormal;
*participation eq;
w=rannor(12340);
z=(1+2*w>e1);
*observe y;
*x=rannor(12340);
x=w;
if z=1 then y=3+3*x+e2;
else y=.;
*err=0;
run;

title '>>>>selection biaed results with OLS<<<';
proc reg data=simu_data;
model y=x;
where y ne .;
run;
quit;

title '2-step appraoch 1-probit model 2-inverse mills ratio';
proc logistic data=simu_data desc;
model z=w/link=probit;
output out=simu_data2 xbeta=xbeta;
run;

*calculte inverse mills ratio;
data simu_data2;
set simu_data2;
imr=pdf('NORMAL',xbeta)/cdf('NORMAL',xbeta);
run;

proc reg data=simu_data2;
model y=x imr;
where y ne .;
run;
quit;

title 'results from heckman approaches ---QLIM';
proc qlim data=simu_data;
model z = w /discrete (d=normal);
model y = x / select(z=1);
run;

title 'results from heckman approaches ---nlmixed';
proc nlmixed data=simu_data;
bounds s >0, -1<r<1;
parms a=2 b=2 c=1 d=1 s=1 r=0.5;
*selection function;
xbeta=c+d*w;
p=probnorm(xbeta);
if z=0 then l=log(1-p);
else if z=1 then do;
e=y-(a+b*x);
l2=(1/(sqrt(2*3.1415927)*s))*exp(-(e**2)/(2*s**2));
l3=probnorm((xbeta+r*e/s)/sqrt(1-r**2));
l=log(l2)+log(l3);
end;
MODEL z ~general(l);

run;

title 'results from heckman approaches ---model';
proc model data=simu_data;
bounds s >0;
parms a=2 b=2 c=1 d=1 s=1 rho=0.5;
*selection function;
xbeta=c+d*w;
p=probnorm(xbeta);
*calculate the likelihood;
z=p;
if z=0 then l=log(1-p);
else if z=1 then do;
e=y-(a+b*x);
l2=(1/(sqrt(2*3.1415927)*s))*exp(-(e**2)/(2*s**2));
l3=probnorm((xbeta+r*e/s)/sqrt(1-r**2));
l=log(l2)+log(l3);
end;
l=-1*l;
ERRORMODEL z ~general(l);
fit z / CONVERGE=1e-8;
run;
quit;

From: =?iso-8859-1?b?U3TpcGhhbmU=?= COLAS on 2 Apr 2007 11:47

Hi,

my 2 cents...

Did you see that proc QLIM lets you the possibility to works on Heckman Model ?

Cordialement/
Best regards,

Stephane COLAS

**************************
Soci�t� Datametric

Notre site / Our site
http://www.datametric.fr
**************************

Selon Shiling Zhang <shiling99(a)YAHOO.COM>:

> On Mar 31, 12:25 am, "cathy" <ly_...(a)hotmail.com> wrote:
> > On Mar 30, 12:44 pm, shilin...(a)yahoo.com wrote:
> >
> >
> >
> >
> >
> > > On Mar 30, 3:39 pm, "cathy" <ly_...(a)hotmail.com> wrote:
> >
> > > > It seems that there are at least two ways of conducting Heckman
> > > > Selection Model in SAS. One way is to do it in two steps and in the
> > > > first step to calculate inverse mill's ratio and apply it in the
> > > > second step. And another way is to directly estimate two equations
> > > > with a correlation coefficient rho generated.
> >
> > > > Does anyone know their differences and under what circumstances should
> > > > each be used?
> >
> > > > Thanks a lot,
> > > > Cathy
> >
> > > Heckman shows that two step approach is equivalent to ML under
> > > binromal assumption in large sample. Two step approach is much easy to
> > > calculate back to later 1970s. That is alll I see the difference.
> >
> > > HTH
> >
> > Do you mean that they are identical? How about the selection of
> > variables in the two equations. I tried to estimate two equations with
> > identical independent variables, but get a rho of 1. Is that because
> > they cannot identical?
> >
> > Thanks,
> > Cathy- Hide quoted text -
> >
> > - Show quoted text -
> > Do you mean that they are identical?
> It means that the estimates will converge to the same limit/true
> parameter in lager sample.
>
> >How about the selection of variables in the two equations.
>
> I am not sure what you refer to. Here is the standard hechman
> incidental selection model set up,
>
> Selection mechanism: z*=gamma*w+u
> z=1 if z*>0; otherwise z=0;
> regresion model: y=beta*x+e
> obseved y if z=1
> (u,e) binormal distributed.
>
> >I tried to estimate two equations with
> > identical independent variables, but get a rho of 1. Is that because
> > they cannot identical?
>
> The two functions are related through the u, and e which is assumed
> binormal distributed. The rho is defined as between (u,e) not (w,x).
> It should be fine if one has w and x are the same.
>
> The is no problem in estimation of selection model but the regresion
> model will have bia estimation problem because it only have a
> subsample when z=1.
>
> Here is an example in which x is set to the same as w.
>
> HTH.
>
> *create a bivariate normal distribution data with sigma1=sigma2=1;
> %let rho=0.8;
> %let size=10000;
>
>
> data binormal;
> rho=ρ
> a1=sqrt((1+rho)/2);
> a2=sqrt((1-rho)/2);
> do i=1 to &size;
> rd1=rannor(12390);
> rd2=rannor(12390);
> e1=a1*rd1+a2*rd2;
> e2=a1*rd1-a2*rd2;
> output;
> end;
> run;
>
>
> *verify the sample data;
> proc corr;
> var e1 e2;
> run;
>
>
> data simu_data;
> set binormal;
> *participation eq;
> w=rannor(12340);
> z=(1+2*w>e1);
> *observe y;
> *x=rannor(12340);
> x=w;
> if z=1 then y=3+3*x+e2;
> else y=.;
> *err=0;
> run;
>
>
> title '>>>>selection biaed results with OLS<<<';
> proc reg data=simu_data;
> model y=x;
> where y ne .;
> run;
> quit;
>
>
> title '2-step appraoch 1-probit model 2-inverse mills ratio';
> proc logistic data=simu_data desc;
> model z=w/link=probit;
> output out=simu_data2 xbeta=xbeta;
> run;
>
>
> *calculte inverse mills ratio;
> data simu_data2;
> set simu_data2;
> imr=pdf('NORMAL',xbeta)/cdf('NORMAL',xbeta);
> run;
>
>
> proc reg data=simu_data2;
> model y=x imr;
> where y ne .;
> run;
> quit;
>
>
> title 'results from heckman approaches ---QLIM';
> proc qlim data=simu_data;
> model z = w /discrete (d=normal);
> model y = x / select(z=1);
> run;
>
>
> title 'results from heckman approaches ---nlmixed';
> proc nlmixed data=simu_data;
> bounds s >0, -1<r<1;
> parms a=2 b=2 c=1 d=1 s=1 r=0.5;
> *selection function;
> xbeta=c+d*w;
> p=probnorm(xbeta);
> if z=0 then l=log(1-p);
> else if z=1 then do;
> e=y-(a+b*x);
> l2=(1/(sqrt(2*3.1415927)*s))*exp(-(e**2)/(2*s**2));
> l3=probnorm((xbeta+r*e/s)/sqrt(1-r**2));
> l=log(l2)+log(l3);
> end;
> MODEL z ~general(l);
>
>
> run;
>
>
> title 'results from heckman approaches ---model';
> proc model data=simu_data;
> bounds s >0;
> parms a=2 b=2 c=1 d=1 s=1 rho=0.5;
> *selection function;
> xbeta=c+d*w;
> p=probnorm(xbeta);
> *calculate the likelihood;
> z=p;
> if z=0 then l=log(1-p);
> else if z=1 then do;
> e=y-(a+b*x);
> l2=(1/(sqrt(2*3.1415927)*s))*exp(-(e**2)/(2*s**2));
> l3=probnorm((xbeta+r*e/s)/sqrt(1-r**2));
> l=log(l2)+log(l3);
> end;
> l=-1*l;
> ERRORMODEL z ~general(l);
> fit z / CONVERGE=1e-8;
> run;
> quit;
>

| Next | Last
Pages: 1 2
Prev: PROC EXPORT error "DBMS type EXCEL2000 not valid for export"
Next: Sample size estimation for Prescott's test