From: Ryan on
On Jan 7, 2:31 pm, stringplaye...(a)YAHOO.COM (Dale McLerran) wrote:
> Christine,
>
> Apparently, you have a case/control design since you are
> using a STRATA statement.  You also indicate that you have
> 80,000 case records and 80,000 control records which would
> suggest further that you might have a 1:1 matched study.
> If so, then you can restructure your data so that you can
> use a simplelogisticregression.  That should solve your
> out-of-memory problem.
>
> So, if you have a 1:1 matched design, here is what you can
> do.  First, merge the matched case and control records
> by stratum (subjid) renaming the exposure variable so that
> you have a case exposure variable and a control exposure
> variable.  We want to compute the difference between the
> two exposure variable values.  At the same time, you need
> to construct a new response variable which has value 1
> for ALL records.
>
> With the restructured data, you can fit theconditionallogisticregressionmodel for the 1:1 matched design without
> need for the STRATA  statement.  You can fit the model
> employing an ordinarylogisticregressionWITHOUT AN
> INTERCEPT and using the difference of the exposure variables
> as the predictor variable.
>
> Code for all of this (using the data set and variables shown
> in your post) would be:
>
>   proc sort data=outf.tendon_short out=tendon_short;
>     by subjid;
>   run;
>
>   data matched_logistic_reg;
>     merge tendon_short(where=(case_flag=1)
>                        rename=(exposure=exposure_case))
>           tendon_short(where=(case_flag^=1)
>                        rename=(exposure=exposure_control));
>     by subjid;
>     exposure_diff = exposure_case - exposure_control;
>     response = 1;
>   run;
>
>   proclogisticdata=matched_logistic_reg;
>     model response = exposure_diff / noint;
>   run;
>
> This approach is described by Hosmer and Lemeshow in a
> chapter on matched studies in their book "AppliedLogisticRegression".  Now, if you have M:N matching, it will be
> another whole kettle of fish.  But let's start out with
> the simple assumption first because I suspect that it will
> meet your need.
>
> By the way, if you do have M:N matching so that the above
> solution will not work for you, then post back to the list
> specifying the maximum values of M and N across all strata.
> We should be able to write code for fitting aconditionallogisticregressionusing the procedure NLMIXED.  But we
> would again need to restructure the data to have all
> of the case and control records which are in a stratum on
> a single record.  The NLMIXED procedure would require a
> fair bit of programming to construct the likelihood.
> I would rather not go there unless it is necessary.
>
> Dale
>
> ---------------------------------------
> Dale McLerran
> Fred Hutchinson Cancer Research Center
> mailto: dmclerra(a)NO_SPAMfhcrc.org
> Ph:                (206) 667-2926        (206) 667-2926
> Fax: (206) 667-5977
> ---------------------------------------
>
> --- On Thu, 1/7/10, Christine Peloquin <christinepeloqu...(a)GMAIL.COM> wrote:
>
>
>
> > From: Christine Peloquin <christinepeloqu...(a)GMAIL.COM>
> > Subject: proclogistic: 'out of memory'
> > To: SA...(a)LISTSERV.UGA.EDU
> > Date: Thursday, January 7, 2010, 7:01 AM
> > hello.
>
> > i just started a job at BU. i am running proclogisticon a
> > dataset with
> > 160,000 observations (80,000 cases and 80,000 controls) -
> > and am receiving
> > an 'out of memory' message.  here is the code that i
> > am running:
>
> > proclogisticdata=outf.tendon_short;
> >  class exposure (ref='0') / param=ref;
> >  strata subjid;
> >  model case_flag (event='1') = exposure;
> > run;
>
> > both the case_flag and exposure variables are dichotomous
> > (numeric
> > variables; values: 0/1).  the subjid is a 11-char
> > variable.
>
> > would anyone have a suggestion of how i could resolve this
> > or what i should
> > be looking at to further debug?
>
> > endless thanks.
> > christine- Hide quoted text -
>
> - Show quoted text -

Dale,

I'm curious if and when it is preferable to run a conditional logistic
regression model in a 1:1 matching design rather than a GEE (pop.
average) or perhaps even a generalized linear mixed model (subj.
specific). I read a conditional logistic regression example online
(http://www.biostat.umn.edu/~will/6470stuff/Lect21/lecture21H.pdf),
where 2 patients from each of the 79 participating clinics were
enrolled in the study. Within each clinic, one of the patients was
assigned to the treatment condition while the other was assigned to
the control condition. I suppose this would be considered a 1:1
matching design. The binary outcome was improve/did not improve. The
suggested code to run a conditional logistic regression model was:

Proc Logistic descending ;
class center treatment(ref="P") gender(ref="F");
model improve = baseline_score treatment;
strata center;

What would be the driving force as to whether you would run a
conditional logistic regression, GEE or perhaps generalized linear
mixed model for this example? Can you think of a scenario where you
would prefer to use a conditional logistic regression in a 1:1
matching design rather than one of the other two options? It's also
interesting in the example code that gender is placed on the class
statement, yet appears no where else in the code.

Thanks,

Ryan
From: Brian Sauer on
On Feb 4, 12:40 pm, stringplaye...(a)YAHOO.COM (Dale McLerran) wrote:
> Brian,
>
> The 1:m matched design is quite easy to implement in NLMIXED.
> Note that the data need to be structured with one record for
> each stratum.  The record must have m+1 variables representing
> the case/control status and also m+1 variables representing
> each of the predictor variables.  In the code below, I assume
> that the m+1 response variables are named Y_1-Y_5.  Similarly,
> I assume that there are two predictor variables (X1 and X2)
> which are represented in wide form as X1_1-X1_5 and X2_1-X2_5.
> Thus, the data set would appear as follows:
>
>  stratum  Y_1  Y_2 ... Y_5  X1_1  X1_2 ... X1_5  X2_1  X2_2 ... X2_5
>     1      1    0       0    36    43       39    97    78      102
>     2      1    0       0    39    38       44    92    81       78
>    ...
>
> Now, for a 1:m design, the conditional likelihood is
>
>     l  =  exp(x{case}*beta) /
>           sum from i=1 to m+1 { exp(x{i}*beta) }
>
> See Hosmer and Lemeshow, Applied Logistic Regression, for a
> more detailed description of the conditional likelihood for a
> case/control matched design.
>
> With data constructed as shown above, then we could fit the
> conditional logistic regression model for a 1:m (max(m)=4)
> with the following code:
>
> proc nlmixed data=mydata;
>   parms b_x1 b_x2 0;
>   array Y_ {5};
>   array X1_ {5};
>   array X2_ {5};
>   do i=1 to 5;
>     if Y_{i}=1 then num = exp(b_x1*X1_{i} + b_x2*X2_{i});
>   end;
>   denom = 0;
>   do i=1 to 5;
>     if y_{i} in (0,1) & nmiss(X1_{i}, X2_{i})=0 then
>        denom = denom + exp(b_x1*X1_{i} + b_x2*X2_{i});
>   end;
>   if num>0 & denom>0 then ll = log(num / denom);
>   else ll = 0;
>
>   model ll ~ general(ll);
> run;
>
> Here is an example which constructs a 1:m design with m=4 for
> all but the last stratum.  In the last stratum, m=2.  Data are
> initially presented in a narrow format with a record for every
> every case or control observation.  The conditional logistic
> regression is fit to the narrow data using PROC LOGISTIC.
> Subsequently, the data are reshaped into a wide form and the
> wide form data are passed to the NLMIXED procedure.  You can
> compare the point estimates and standard errors, as well a
> the model fit statistics which are produced by the NLMIXED
> procedure against the same statistics generated by the LOGISTIC
> procedure.  We do get the same results.  (Oh happy day!)
>
> data Data1;
>   do ID=1 to 63;
>     do Outcome = 1 to 0 by -1;
>       count+1;
>       if count=1 then do;
>         stratum+1;
>         y = 1;
>       end;
>       else y = 0;
>       input Gall Hyper @@;
>       output;
>       if count=5 then count=0;
>     end;
>   end;
> datalines;
> 0 0  0 0    0 0  0 0    0 1  0 1    0 0  1 0    1 0  0 1
> 0 1  0 0    1 0  0 0    1 1  0 1    0 0  0 0    0 0  0 0
> 1 0  0 0    0 0  0 1    1 0  0 1    1 0  1 0    1 0  0 1
> 0 1  0 0    0 0  1 1    0 0  1 1    0 0  0 1    0 1  0 0
> 0 0  1 1    0 1  0 1    0 1  0 0    0 0  0 0    0 0  0 0
> 0 0  0 1    1 0  0 1    0 0  0 1    1 0  0 0    0 1  0 0
> 0 1  0 0    0 1  0 0    0 1  0 0    0 0  0 0    1 1  1 1
> 0 0  0 1    0 1  0 0    0 1  0 1    0 1  0 1    0 1  0 0
> 0 0  0 0    0 1  1 0    0 0  0 1    0 0  0 0    1 0  0 0
> 0 0  0 0    1 1  0 0    0 1  0 0    0 0  0 0    0 1  0 1
> 0 0  0 0    0 1  0 1    0 1  0 0    0 1  0 0    1 0  0 0
> 0 0  0 0    1 1  1 0    0 0  0 0    0 0  0 0    1 1  0 0
> 1 0  1 0    0 1  0 0    1 0  0 0
> ;
>
> proc logistic data=Data1;
>   strata stratum;
>   model y(event='1')=Gall Hyper;
> run;
>
> data data2;
>   set data1;
>   by stratum;
>   array y_ {5};
>   array Gall_ {5};
>   array Hyper_ {5};
>   if first.stratum then do;
>     pointer=0;
>     do i=1 to 5;
>       y_{i} = .;
>       gall_{i} = .;
>       hyper_{i} = .;
>     end;
>   end;
>   pointer + 1;
>   y_{pointer} + y;
>   gall_{pointer} + gall;
>   hyper_{pointer} + hyper;
>   if last.stratum then output;
>   keep stratum y_: gall_: hyper_:;
> run;
>
> proc nlmixed data=data2;
>   parms b1 b2 0;
>   array Y_ {5};
>   array X1_ {5} gall_1-gall_5;
>   array X2_ {5} hyper_1-hyper_5;
>   do i=1 to 5;
>     if Y_{i}=1 then num = exp(b1*X1_{i} + b2*X2_{i});
>   end;
>   denom = 0;
>   do i=1 to 5;
>     if y_{i} in (0,1) & nmiss(X1_{i}, X2_{i})=0 then
>        denom = denom + exp(b1*X1_{i} + b2*X2_{i});
>   end;
>   if num>0 & denom>0 then ll = log(num / denom);
>   else ll = 0;
>
>   model ll ~ general(ll);
> run;
>
> Let me know if this does allow you to fit the 1:m matched design
> in the large data set which you have.  I would think that it
> would, but am not certain as to whether the NLMIXED procedure
> stores the entire data in memory or re-reads data as needed for
> the iterative process.  Storing the data in memory would improve
> computational efficiency for an iterative process.  However, for
> extremely large data sets, you could run out of memory.
>
> Note that it would be wise to pass only the variables which are
> needed for the logistic regression.  This would speed up data
> throughput, and could also reduce the amount of memory required
> to hold data in memory.  Thus, it would be a good idea to use a
> keep option to restrict the variables that are passed into the
> NLMIXED procedure.
>
> HTH,
>
> Dale
>
> ---------------------------------------
> Dale McLerran
> Fred Hutchinson Cancer Research Center
> mailto: dmclerra(a)NO_SPAMfhcrc.org
> Ph:  (206) 667-2926
> Fax: (206) 667-5977
> ---------------------------------------
>
> --- On Wed, 2/3/10, Brian Sauer <brian.sa...(a)GMAIL.COM> wrote:
>
> > From: Brian Sauer <brian.sa...(a)GMAIL.COM>
> > Subject: Re: proc logistic: 'out of memory'
> > To: SA...(a)LISTSERV.UGA.EDU
> > Date: Wednesday, February 3, 2010, 9:02 AM
> > Hi Dale,
> > I am in a similar situation with Christine, but I have a
> > 1:m matching
> > problem.  I am using a case-crossover design and the
> > sas program I
> > developed allows the user to select the number of control
> > windows - up
> > to 4.  I didn't consider the limitations of
> > conditional logistic when
> > designing this program.  This program is intended to
> > be used an large
> > healthcare databases and could easily have 100,000 cases or
> > so. Proc
> > logistic with a strata statement returns an out of memory
> > warning.  In
> > your previous post you mentioned a NLMIXED solution.
> > If you have
> > worked this out would you please share it as this is beyond
> > my skill
> > set at this time.
> > Thanks,
> > Brian
> >http://www.bmi.utah.edu/?module=facultyDetails&personId=8363&orgId=382
> > On Jan 7, 12:31 pm, stringplaye...(a)YAHOO.COM
> > (Dale McLerran) wrote:
> > > Christine,
>
> > > Apparently, you have a case/control design since you
> > are
> > > using a STRATA statement.  You also indicate that
> > you have
> > > 80,000 case records and 80,000 control records which
> > would
> > > suggest further that you might have a 1:1 matched
> > study.
> > > If so, then you can restructure your data so that you
> > can
> > > use a simple logistic regression.  That should
> > solve your
> > > out-of-memory problem.
>
> > > So, if you have a 1:1 matched design, here is what you
> > can
> > > do.  First, merge the matched case and control
> > records
> > > by stratum (subjid) renaming the exposure variable so
> > that
> > > you have a case exposure variable and a control
> > exposure
> > > variable.  We want to compute the difference
> > between the
> > > two exposure variable values.  At the same time,
> > you need
> > > to construct a new response variable which has value
> > 1
> > > for ALL records.
>
> > > With the restructured data, you can fit the
> > conditional
> > > logistic regression model for the 1:1 matched design
> > without
> > > need for the STRATA  statement.  You can fit
> > the model
> > > employing an ordinary logistic regression WITHOUT AN
> > > INTERCEPT and using the difference of the exposure
> > variables
> > > as the predictor variable.
>
> > > Code for all of this (using the data set and variables
> > shown
> > > in your post) would be:
>
> > >   proc sort data=outf.tendon_short
> > out=tendon_short;
> > >     by subjid;
> > >   run;
>
> > >   data matched_logistic_reg;
> > >     merge
> > tendon_short(where=(case_flag=1)
>
> > rename=(exposure=exposure_case))
>
> >    tendon_short(where=(case_flag^=1)
>
> > rename=(exposure=exposure_control));
> > >     by subjid;
> > >     exposure_diff = exposure_case
> > - exposure_control;
> > >     response = 1;
> > >   run;
>
> > >   proc logistic
> > data=matched_logistic_reg;
> > >     model response = exposure_diff
> > / noint;
> > >   run;
>
> > > This approach is described by Hosmer and Lemeshow in
> > a
> > > chapter on matched studies in their book "Applied
> > Logistic
> > > Regression".  Now, if you have M:N matching, it
> > will be
> > > another whole kettle of fish.  But let's start
> > out with
> > > the simple assumption first because I suspect that it
> > will
> > > meet your need.
>
> > > By the way, if you do have M:N matching so that the
> > above
> > > solution will not work for you, then post back to the
> > list
> > > specifying the maximum values of M and N across all
> > strata.
> > > We should be able to write code for fitting a
> > conditional
> > > logistic regression using the procedure NLMIXED.
> > But we
> > > would again need to restructure the data to have all
> > > of the case and control records which are in a stratum
> > on
> > > a single record.  The NLMIXED procedure would
> > require a
> > > fair bit of programming to construct the likelihood.
> > > I would rather not go there unless it is necessary.
>
> > > Dale
>
> > > ---------------------------------------
> > > Dale McLerran
> > > Fred Hutchinson Cancer Research Center
> > > mailto: dmclerra(a)NO_SPAMfhcrc.org
> > > Ph:  (206) 667-2926
> > > Fax: (206) 667-5977
> > > ---------------------------------------
>
> > > --- On Thu, 1/7/10, Christine Peloquin <christinepeloqu...(a)GMAIL.COM>
> > wrote:
>
> > > > From: Christine Peloquin <christinepeloqu...(a)GMAIL.COM>
> > > > Subject: proc logistic: 'out of memory'
> > > > To: SA...(a)LISTSERV.UGA.EDU
> > > > Date: Thursday, January 7, 2010, 7:01 AM
> > > > hello.
>
> > > > i just started a job at BU. i am running proc
> > logistic on a
> > > > dataset with
> > > > 160,000 observations (80,000 cases and 80,000
> > controls) -
> > > > and am receiving
> > > > an 'out of memory' message.  here is the
> > code that i
> > > > am running:
>
> > > > proc logistic data=outf.tendon_short;
> > > >  class exposure (ref='0') / param=ref;
>
> ...
>
> read more »

WOW!! this is an extremely helpful response. Thanks Dale!
From: Brian Sauer on
On Feb 4, 12:40 pm, stringplaye...(a)YAHOO.COM (Dale McLerran) wrote:
> Brian,
>
> The 1:m matched design is quite easy to implement in NLMIXED.
> Note that the data need to be structured with one record for
> each stratum.  The record must have m+1 variables representing
> the case/control status and also m+1 variables representing
> each of the predictor variables.  In the code below, I assume
> that the m+1 response variables are named Y_1-Y_5.  Similarly,
> I assume that there are two predictor variables (X1 and X2)
> which are represented in wide form as X1_1-X1_5 and X2_1-X2_5.
> Thus, the data set would appear as follows:
>
>  stratum  Y_1  Y_2 ... Y_5  X1_1  X1_2 ... X1_5  X2_1  X2_2 ... X2_5
>     1      1    0       0    36    43       39    97    78      102
>     2      1    0       0    39    38       44    92    81       78
>    ...
>
> Now, for a 1:m design, the conditional likelihood is
>
>     l  =  exp(x{case}*beta) /
>           sum from i=1 to m+1 { exp(x{i}*beta) }
>
> See Hosmer and Lemeshow, Applied Logistic Regression, for a
> more detailed description of the conditional likelihood for a
> case/control matched design.
>
> With data constructed as shown above, then we could fit the
> conditional logistic regression model for a 1:m (max(m)=4)
> with the following code:
>
> proc nlmixed data=mydata;
>   parms b_x1 b_x2 0;
>   array Y_ {5};
>   array X1_ {5};
>   array X2_ {5};
>   do i=1 to 5;
>     if Y_{i}=1 then num = exp(b_x1*X1_{i} + b_x2*X2_{i});
>   end;
>   denom = 0;
>   do i=1 to 5;
>     if y_{i} in (0,1) & nmiss(X1_{i}, X2_{i})=0 then
>        denom = denom + exp(b_x1*X1_{i} + b_x2*X2_{i});
>   end;
>   if num>0 & denom>0 then ll = log(num / denom);
>   else ll = 0;
>
>   model ll ~ general(ll);
> run;
>
> Here is an example which constructs a 1:m design with m=4 for
> all but the last stratum.  In the last stratum, m=2.  Data are
> initially presented in a narrow format with a record for every
> every case or control observation.  The conditional logistic
> regression is fit to the narrow data using PROC LOGISTIC.
> Subsequently, the data are reshaped into a wide form and the
> wide form data are passed to the NLMIXED procedure.  You can
> compare the point estimates and standard errors, as well a
> the model fit statistics which are produced by the NLMIXED
> procedure against the same statistics generated by the LOGISTIC
> procedure.  We do get the same results.  (Oh happy day!)
>
> data Data1;
>   do ID=1 to 63;
>     do Outcome = 1 to 0 by -1;
>       count+1;
>       if count=1 then do;
>         stratum+1;
>         y = 1;
>       end;
>       else y = 0;
>       input Gall Hyper @@;
>       output;
>       if count=5 then count=0;
>     end;
>   end;
> datalines;
> 0 0  0 0    0 0  0 0    0 1  0 1    0 0  1 0    1 0  0 1
> 0 1  0 0    1 0  0 0    1 1  0 1    0 0  0 0    0 0  0 0
> 1 0  0 0    0 0  0 1    1 0  0 1    1 0  1 0    1 0  0 1
> 0 1  0 0    0 0  1 1    0 0  1 1    0 0  0 1    0 1  0 0
> 0 0  1 1    0 1  0 1    0 1  0 0    0 0  0 0    0 0  0 0
> 0 0  0 1    1 0  0 1    0 0  0 1    1 0  0 0    0 1  0 0
> 0 1  0 0    0 1  0 0    0 1  0 0    0 0  0 0    1 1  1 1
> 0 0  0 1    0 1  0 0    0 1  0 1    0 1  0 1    0 1  0 0
> 0 0  0 0    0 1  1 0    0 0  0 1    0 0  0 0    1 0  0 0
> 0 0  0 0    1 1  0 0    0 1  0 0    0 0  0 0    0 1  0 1
> 0 0  0 0    0 1  0 1    0 1  0 0    0 1  0 0    1 0  0 0
> 0 0  0 0    1 1  1 0    0 0  0 0    0 0  0 0    1 1  0 0
> 1 0  1 0    0 1  0 0    1 0  0 0
> ;
>
> proc logistic data=Data1;
>   strata stratum;
>   model y(event='1')=Gall Hyper;
> run;
>
> data data2;
>   set data1;
>   by stratum;
>   array y_ {5};
>   array Gall_ {5};
>   array Hyper_ {5};
>   if first.stratum then do;
>     pointer=0;
>     do i=1 to 5;
>       y_{i} = .;
>       gall_{i} = .;
>       hyper_{i} = .;
>     end;
>   end;
>   pointer + 1;
>   y_{pointer} + y;
>   gall_{pointer} + gall;
>   hyper_{pointer} + hyper;
>   if last.stratum then output;
>   keep stratum y_: gall_: hyper_:;
> run;
>
> proc nlmixed data=data2;
>   parms b1 b2 0;
>   array Y_ {5};
>   array X1_ {5} gall_1-gall_5;
>   array X2_ {5} hyper_1-hyper_5;
>   do i=1 to 5;
>     if Y_{i}=1 then num = exp(b1*X1_{i} + b2*X2_{i});
>   end;
>   denom = 0;
>   do i=1 to 5;
>     if y_{i} in (0,1) & nmiss(X1_{i}, X2_{i})=0 then
>        denom = denom + exp(b1*X1_{i} + b2*X2_{i});
>   end;
>   if num>0 & denom>0 then ll = log(num / denom);
>   else ll = 0;
>
>   model ll ~ general(ll);
> run;
>
> Let me know if this does allow you to fit the 1:m matched design
> in the large data set which you have.  I would think that it
> would, but am not certain as to whether the NLMIXED procedure
> stores the entire data in memory or re-reads data as needed for
> the iterative process.  Storing the data in memory would improve
> computational efficiency for an iterative process.  However, for
> extremely large data sets, you could run out of memory.
>
> Note that it would be wise to pass only the variables which are
> needed for the logistic regression.  This would speed up data
> throughput, and could also reduce the amount of memory required
> to hold data in memory.  Thus, it would be a good idea to use a
> keep option to restrict the variables that are passed into the
> NLMIXED procedure.
>
> HTH,
>
> Dale
>
> ---------------------------------------
> Dale McLerran
> Fred Hutchinson Cancer Research Center
> mailto: dmclerra(a)NO_SPAMfhcrc.org
> Ph:  (206) 667-2926
> Fax: (206) 667-5977
> ---------------------------------------
>
> --- On Wed, 2/3/10, Brian Sauer <brian.sa...(a)GMAIL.COM> wrote:> From: Brian Sauer <brian.sa...(a)GMAIL.COM>
> > Subject: Re: proc logistic: 'out of memory'
> > To: SA...(a)LISTSERV.UGA.EDU
> > Date: Wednesday, February 3, 2010, 9:02 AM
> > Hi Dale,
> > I am in a similar situation with Christine, but I have a
> > 1:m matching
> > problem.  I am using a case-crossover design and the
> > sas program I
> > developed allows the user to select the number of control
> > windows - up
> > to 4.  I didn't consider the limitations of
> > conditional logistic when
> > designing this program.  This program is intended to
> > be used an large
> > healthcare databases and could easily have 100,000 cases or
> > so. Proc
> > logistic with a strata statement returns an out of memory
> > warning.  In
> > your previous post you mentioned a NLMIXED solution.
> > If you have
> > worked this out would you please share it as this is beyond
> > my skill
> > set at this time.
> > Thanks,
> > Brian
> >http://www.bmi.utah.edu/?module=facultyDetails&personId=8363&orgId=382
> > On Jan 7, 12:31 pm, stringplaye...(a)YAHOO.COM
> > (Dale McLerran) wrote:
> > > Christine,
>
> > > Apparently, you have a case/control design since you
> > are
> > > using a STRATA statement.  You also indicate that
> > you have
> > > 80,000 case records and 80,000 control records which
> > would
> > > suggest further that you might have a 1:1 matched
> > study.
> > > If so, then you can restructure your data so that you
> > can
> > > use a simple logistic regression.  That should
> > solve your
> > > out-of-memory problem.
>
> > > So, if you have a 1:1 matched design, here is what you
> > can
> > > do.  First, merge the matched case and control
> > records
> > > by stratum (subjid) renaming the exposure variable so
> > that
> > > you have a case exposure variable and a control
> > exposure
> > > variable.  We want to compute the difference
> > between the
> > > two exposure variable values.  At the same time,
> > you need
> > > to construct a new response variable which has value
> > 1
> > > for ALL records.
>
> > > With the restructured data, you can fit the
> > conditional
> > > logistic regression model for the 1:1 matched design
> > without
> > > need for the STRATA  statement.  You can fit
> > the model
> > > employing an ordinary logistic regression WITHOUT AN
> > > INTERCEPT and using the difference of the exposure
> > variables
> > > as the predictor variable.
>
> > > Code for all of this (using the data set and variables
> > shown
> > > in your post) would be:
>
> > >   proc sort data=outf.tendon_short
> > out=tendon_short;
> > >     by subjid;
> > >   run;
>
> > >   data matched_logistic_reg;
> > >     merge
> > tendon_short(where=(case_flag=1)
>
> > rename=(exposure=exposure_case))
>
> >    tendon_short(where=(case_flag^=1)
>
> > rename=(exposure=exposure_control));
> > >     by subjid;
> > >     exposure_diff = exposure_case
> > - exposure_control;
> > >     response = 1;
> > >   run;
>
> > >   proc logistic
> > data=matched_logistic_reg;
> > >     model response = exposure_diff
> > / noint;
> > >   run;
>
> > > This approach is described by Hosmer and Lemeshow in
> > a
> > > chapter on matched studies in their book "Applied
> > Logistic
> > > Regression".  Now, if you have M:N matching, it
> > will be
> > > another whole kettle of fish.  But let's start
> > out with
> > > the simple assumption first because I suspect that it
> > will
> > > meet your need.
>
> > > By the way, if you do have M:N matching so that the
> > above
> > > solution will not work for you, then post back to the
> > list
> > > specifying the maximum values of M and N across all
> > strata.
> > > We should be able to write code for fitting a
> > conditional
> > > logistic regression using the procedure NLMIXED.
> > But we
> > > would again need to restructure the data to have all
> > > of the case and control records which are in a stratum
> > on
> > > a single record.  The NLMIXED procedure would
> > require a
> > > fair bit of programming to construct the likelihood.
> > > I would rather not go there unless it is necessary.
>
> > > Dale
>
> > > ---------------------------------------
> > > Dale McLerran
> > > Fred Hutchinson Cancer Research Center
> > > mailto: dmclerra(a)NO_SPAMfhcrc.org
> > > Ph:  (206) 667-2926
> > > Fax: (206) 667-5977
> > > ---------------------------------------
>
> > > --- On Thu, 1/7/10, Christine Peloquin <christinepeloqu...(a)GMAIL.COM>
> > wrote:
>
> > > > From: Christine Peloquin <christinepeloqu...(a)GMAIL.COM>
> > > > Subject: proc logistic: 'out of memory'
> > > > To: SA...(a)LISTSERV.UGA.EDU
> > > > Date: Thursday, January 7, 2010, 7:01 AM
> > > > hello.
>
> > > > i just started a job at BU. i am running proc
> > logistic on a
> > > > dataset with
> > > > 160,000 observations (80,000 cases and 80,000
> > controls) -
> > > > and am receiving
> > > > an 'out of memory' message.  here is the
> > code that i
> > > > am running:
>
> > > > proc logistic data=outf.tendon_short;
> > > >  class exposure (ref='0') / param=ref;
> > > >  strata subjid;
> > > >  model case_flag (event='1') = exposure;
> > > > run;
>
> > > > both the case_flag and exposure variables are
> > dichotomous
> > > > (numeric
> > > > variables; values: 0/1).  the subjid is a
> > 11-char
> > > > variable.
>
> > > > would anyone have a suggestion of how i could
> > resolve this
> > > > or what i should
> > > > be looking at to further debug?
>
> > > > endless thanks.
> > > > christine
I am in the process of testing the NLMIXED technique that Dale
mentioned for my problem, but I wanted to share the answer I received
from SAS support.
Brian,

The problem is caused by the default check for dependencies between
strata and the predictors in SAS 9.1. This can require a large amount
of memory when there are many strata. You can turn off this check by
specifying the NOLINDEP option in the STRATA statement. For example:

strata strata / nonlindep;

This check is off by default in all the current SAS 9.2 releases.

----
NOTE: If you have any follow-up questions on this matter, please
reply to this email by February 10, 2010.

- - - - - - - - -
David Schlotzhauer Phone: (919) 677-8008
Senior Statistical Consultant Web: support.sas.com/ts
SAS Institute Inc.
From: Dale McLerran on
--- On Sun, 2/7/10, Brian Sauer <brian.sauer(a)GMAIL.COM> wrote:

> From: Brian Sauer <brian.sauer(a)GMAIL.COM>
> Subject: Re: proc logistic: 'out of memory'
> To: SAS-L(a)LISTSERV.UGA.EDU
> Date: Sunday, February 7, 2010, 4:47 PM
> I am in the process of testing the NLMIXED technique that Dale
> mentioned for my problem, but I wanted to share the answer I received
> from SAS support.
> Brian,
>
> The problem is caused by the default check for dependencies between
> strata and the predictors in SAS 9.1. This can require a large amount
> of memory when there are many strata. You can turn off this check by
> specifying the NOLINDEP option in the STRATA statement. For example:
>
> strata strata / nonlindep;
>
> This check is off by default in all the current SAS 9.2 releases.
>
> ----
> NOTE: If you have any follow-up questions on this matter, please
> reply to this email by February 10, 2010.
>
> - - - - - - - - -
> David Schlotzhauer
> Phone: (919) 677-8008
> Senior Statistical Consultant Web:
> support.sas.com/ts
> SAS Institute Inc.
>

OK, that is useful to know. However, the syntax specified above
does not match syntax in documentation of version 9.1.3 or 9.2.
The 9.2 documentation indicates the following syntax which I
would think would be the official syntax:

strata ... / CHECKDEPENDENCY=NONE;


Neither the CHECKDEPENDENCY option nor the NONLINEDEP option is
indicated in version 9.1.3 documentation. Have you tried the
syntax with NONLINDEP specified as an option in one (or both)
of these versions?

Just as an aside, I am sure that it is reasonable to perform
this sort of linear dependency check when the number of strata
is small. However, it would seem that SI might have implemented
the linear dependency checking with an initial determination of
the number of strata in the data set. If the number of strata
is above a certain level (say, 1000), then the linear dependency
checking would not be enforced (under the assumption that given
a large number of strata, it is not likely that one would
encounter complete linear dependency between covariates and
strata. But from the standpoint of a computer programmer, I
can see where it is always good to evaluate whether the data
conform to the requirements of the estimation procedure.

Dale

---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra(a)NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------
From: Dale McLerran on
--- On Sat, 2/6/10, Ryan <ryan.andrew.black(a)GMAIL.COM> wrote:

> From: Ryan <ryan.andrew.black(a)GMAIL.COM>
> Subject: Re: proc logistic: 'out of memory'
> To: SAS-L(a)LISTSERV.UGA.EDU
> Date: Saturday, February 6, 2010, 7:14 PM
> On Jan 7, 2:31 pm, stringplaye...(a)YAHOO.COM
>
> Dale,
>
> I'm curious if and when it is preferable to run a conditional logistic
> regression model in a 1:1 matching design rather than a GEE (pop.
> average) or perhaps even a generalized linear mixed model (subj.
> specific). I read a conditional logistic regression example online
> (http://www.biostat.umn.edu/~will/6470stuff/Lect21/lecture21H.pdf),
> where 2 patients from each of the 79 participating clinics were
> enrolled in the study. Within each clinic, one of the patients was
> assigned to the treatment condition while the other was assigned to
> the control condition. I suppose this would be considered a 1:1
> matching design. The binary outcome was improve/did not improve. The
> suggested code to run a conditional logistic regression model was:
>
> Proc Logistic descending ;
> class center treatment(ref="P") gender(ref="F");
> model improve = baseline_score treatment;
> strata center;
>
> What would be the driving force as to whether you would run a
> conditional logistic regression, GEE or perhaps generalized linear
> mixed model for this example? Can you think of a scenario where you
> would prefer to use a conditional logistic regression in a 1:1
> matching design rather than one of the other two options? It's also
> interesting in the example code that gender is placed on the class
> statement, yet appears no where else in the code.
>
> Thanks,
>
> Ryan
>

Ryan,

Let me point you to a SUGI presentation from 2002 (SUGI 27) by
Oliver Kuss as a good read on this topic. See

http://www2.sas.com/proceedings/sugi27/p261-27.pdf

Oliver demonstrates that the conditional logistic regression
parameter estimates are similar to the parameter estimates that
one would obtain for the mixed model fitted using NLMIXED or
GLIMMIX. It is interesting to note that both a stratified
analysis and an analysis in which random effects are estimated
in the model are both referred to as conditional logistic
regression models.

The stratified model has some advantages in that it is
semi-nonparametric. It is not necessary to assume that the
subject/stratum random effects are normally distributed.
Also, it is worth noting that even if the subject random
effects are normally distributed, it is not necessary to
estimate the subject random effects using the stratified
analysis. To the extent that Gaussian quadrature does not
produce exact integration over the random effects, the results
from a method which approximates the integration has some loss.
For shared parameters, the stratified analysis may actually be
preferred to results from a procedure which estimates the
random effects.

But the stratified analysis also has some loss in that the
intercept is conditioned out of the model. Thus, we cannot
estimate the probability of the outcome for the j-th
observation from the i-th stratum.

I have noted before - and Oliver discusses this as well with
a little different twist from how I typically present the case -
that the random effects model and the GEE model do estimate
different effects. The GEE model estimates marginal or population
average estimates of effects. If you want to know for policy
purposes what the average effect estimate will be in a population,
then you would prefer to present estimates obtained from a model
estimated employing GEE (or similar) effect estimation. However,
if you want to know what is the benefit to the i-th subject,
you want a conditional or subject-specific effect estimate.

As for the example of the stratified conditional analysis which
included gender in the CLASS statement but nowhere else, the
inclusion of gender in the class statement was no doubt residual
from a full model in which observation-specific effects (possibly
including age as a continuous effect) were incorporated into the
model. As long as every person has a gender specification, then
naming gender on the CLASS statement would not affect the
estimates obtained for the MODEL statement which was specified.

HTH,

Dale

---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra(a)NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------