proc logistic: 'out of memory' [SAS]

Prev: create a file using filename FTP method
Next: SAS EG Graph problem??

From: Dale McLerran on 7 Jan 2010 14:31

Christine,

Apparently, you have a case/control design since you are
using a STRATA statement. You also indicate that you have
80,000 case records and 80,000 control records which would
suggest further that you might have a 1:1 matched study.
If so, then you can restructure your data so that you can
use a simple logistic regression. That should solve your
out-of-memory problem.

So, if you have a 1:1 matched design, here is what you can
do. First, merge the matched case and control records
by stratum (subjid) renaming the exposure variable so that
you have a case exposure variable and a control exposure
variable. We want to compute the difference between the
two exposure variable values. At the same time, you need
to construct a new response variable which has value 1
for ALL records.

With the restructured data, you can fit the conditional
logistic regression model for the 1:1 matched design without
need for the STRATA statement. You can fit the model
employing an ordinary logistic regression WITHOUT AN
INTERCEPT and using the difference of the exposure variables
as the predictor variable.

Code for all of this (using the data set and variables shown
in your post) would be:

proc sort data=outf.tendon_short out=tendon_short;
by subjid;
run;

data matched_logistic_reg;
merge tendon_short(where=(case_flag=1)
rename=(exposure=exposure_case))
tendon_short(where=(case_flag^=1)
rename=(exposure=exposure_control));
by subjid;
exposure_diff = exposure_case - exposure_control;
response = 1;
run;

proc logistic data=matched_logistic_reg;
model response = exposure_diff / noint;
run;

This approach is described by Hosmer and Lemeshow in a
chapter on matched studies in their book "Applied Logistic
Regression". Now, if you have M:N matching, it will be
another whole kettle of fish. But let's start out with
the simple assumption first because I suspect that it will
meet your need.

By the way, if you do have M:N matching so that the above
solution will not work for you, then post back to the list
specifying the maximum values of M and N across all strata.
We should be able to write code for fitting a conditional
logistic regression using the procedure NLMIXED. But we
would again need to restructure the data to have all
of the case and control records which are in a stratum on
a single record. The NLMIXED procedure would require a
fair bit of programming to construct the likelihood.
I would rather not go there unless it is necessary.

Dale

---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra(a)NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------

--- On Thu, 1/7/10, Christine Peloquin <christinepeloquin1(a)GMAIL.COM> wrote:

> From: Christine Peloquin <christinepeloquin1(a)GMAIL.COM>
> Subject: proc logistic: 'out of memory'
> To: SAS-L(a)LISTSERV.UGA.EDU
> Date: Thursday, January 7, 2010, 7:01 AM
> hello.
>
> i just started a job at BU. i am running proc logistic on a
> dataset with
> 160,000 observations (80,000 cases and 80,000 controls) -
> and am receiving
> an 'out of memory' message. here is the code that i
> am running:
>
> proc logistic data=outf.tendon_short;
> class exposure (ref='0') / param=ref;
> strata subjid;
> model case_flag (event='1') = exposure;
> run;
>
> both the case_flag and exposure variables are dichotomous
> (numeric
> variables; values: 0/1). the subjid is a 11-char
> variable.
>
> would anyone have a suggestion of how i could resolve this
> or what i should
> be looking at to further debug?
>
> endless thanks.
> christine
>

From: Brian Sauer on 3 Feb 2010 12:02

Hi Dale,
I am in a similar situation with Christine, but I have a 1:m matching
problem. I am using a case-crossover design and the sas program I
developed allows the user to select the number of control windows - up
to 4. I didn't consider the limitations of conditional logistic when
designing this program. This program is intended to be used an large
healthcare databases and could easily have 100,000 cases or so. Proc
logistic with a strata statement returns an out of memory warning. In
your previous post you mentioned a NLMIXED solution. If you have
worked this out would you please share it as this is beyond my skill
set at this time.
Thanks,
Brian
http://www.bmi.utah.edu/?module=facultyDetails&personId=8363&orgId=382
On Jan 7, 12:31 pm, stringplaye...(a)YAHOO.COM (Dale McLerran) wrote:
> Christine,
>
> Apparently, you have a case/control design since you are
> using a STRATA statement. You also indicate that you have
> 80,000 case records and 80,000 control records which would
> suggest further that you might have a 1:1 matched study.
> If so, then you can restructure your data so that you can
> use a simple logistic regression. That should solve your
> out-of-memory problem.
>
> So, if you have a 1:1 matched design, here is what you can
> do. First, merge the matched case and control records
> by stratum (subjid) renaming the exposure variable so that
> you have a case exposure variable and a control exposure
> variable. We want to compute the difference between the
> two exposure variable values. At the same time, you need
> to construct a new response variable which has value 1
> for ALL records.
>
> With the restructured data, you can fit the conditional
> logistic regression model for the 1:1 matched design without
> need for the STRATA statement. You can fit the model
> employing an ordinary logistic regression WITHOUT AN
> INTERCEPT and using the difference of the exposure variables
> as the predictor variable.
>
> Code for all of this (using the data set and variables shown
> in your post) would be:
>
> proc sort data=outf.tendon_short out=tendon_short;
> by subjid;
> run;
>
> data matched_logistic_reg;
> merge tendon_short(where=(case_flag=1)
> rename=(exposure=exposure_case))
> tendon_short(where=(case_flag^=1)
> rename=(exposure=exposure_control));
> by subjid;
> exposure_diff = exposure_case - exposure_control;
> response = 1;
> run;
>
> proc logistic data=matched_logistic_reg;
> model response = exposure_diff / noint;
> run;
>
> This approach is described by Hosmer and Lemeshow in a
> chapter on matched studies in their book "Applied Logistic
> Regression". Now, if you have M:N matching, it will be
> another whole kettle of fish. But let's start out with
> the simple assumption first because I suspect that it will
> meet your need.
>
> By the way, if you do have M:N matching so that the above
> solution will not work for you, then post back to the list
> specifying the maximum values of M and N across all strata.
> We should be able to write code for fitting a conditional
> logistic regression using the procedure NLMIXED. But we
> would again need to restructure the data to have all
> of the case and control records which are in a stratum on
> a single record. The NLMIXED procedure would require a
> fair bit of programming to construct the likelihood.
> I would rather not go there unless it is necessary.
>
> Dale
>
> ---------------------------------------
> Dale McLerran
> Fred Hutchinson Cancer Research Center
> mailto: dmclerra(a)NO_SPAMfhcrc.org
> Ph: (206) 667-2926
> Fax: (206) 667-5977
> ---------------------------------------
>
> --- On Thu, 1/7/10, Christine Peloquin <christinepeloqu...(a)GMAIL.COM> wrote:
>
>
>
> > From: Christine Peloquin <christinepeloqu...(a)GMAIL.COM>
> > Subject: proc logistic: 'out of memory'
> > To: SA...(a)LISTSERV.UGA.EDU
> > Date: Thursday, January 7, 2010, 7:01 AM
> > hello.
>
> > i just started a job at BU. i am running proc logistic on a
> > dataset with
> > 160,000 observations (80,000 cases and 80,000 controls) -
> > and am receiving
> > an 'out of memory' message. here is the code that i
> > am running:
>
> > proc logistic data=outf.tendon_short;
> > class exposure (ref='0') / param=ref;
> > strata subjid;
> > model case_flag (event='1') = exposure;
> > run;
>
> > both the case_flag and exposure variables are dichotomous
> > (numeric
> > variables; values: 0/1). the subjid is a 11-char
> > variable.
>
> > would anyone have a suggestion of how i could resolve this
> > or what i should
> > be looking at to further debug?
>
> > endless thanks.
> > christine

From: Dale McLerran on 4 Feb 2010 14:40

Brian,

The 1:m matched design is quite easy to implement in NLMIXED.
Note that the data need to be structured with one record for
each stratum. The record must have m+1 variables representing
the case/control status and also m+1 variables representing
each of the predictor variables. In the code below, I assume
that the m+1 response variables are named Y_1-Y_5. Similarly,
I assume that there are two predictor variables (X1 and X2)
which are represented in wide form as X1_1-X1_5 and X2_1-X2_5.
Thus, the data set would appear as follows:

stratum Y_1 Y_2 ... Y_5 X1_1 X1_2 ... X1_5 X2_1 X2_2 ... X2_5
1 1 0 0 36 43 39 97 78 102
2 1 0 0 39 38 44 92 81 78
...

Now, for a 1:m design, the conditional likelihood is

l = exp(x{case}*beta) /
sum from i=1 to m+1 { exp(x{i}*beta) }

See Hosmer and Lemeshow, Applied Logistic Regression, for a
more detailed description of the conditional likelihood for a
case/control matched design.

With data constructed as shown above, then we could fit the
conditional logistic regression model for a 1:m (max(m)=4)
with the following code:

proc nlmixed data=mydata;
parms b_x1 b_x2 0;
array Y_ {5};
array X1_ {5};
array X2_ {5};
do i=1 to 5;
if Y_{i}=1 then num = exp(b_x1*X1_{i} + b_x2*X2_{i});
end;
denom = 0;
do i=1 to 5;
if y_{i} in (0,1) & nmiss(X1_{i}, X2_{i})=0 then
denom = denom + exp(b_x1*X1_{i} + b_x2*X2_{i});
end;
if num>0 & denom>0 then ll = log(num / denom);
else ll = 0;

model ll ~ general(ll);
run;

Here is an example which constructs a 1:m design with m=4 for
all but the last stratum. In the last stratum, m=2. Data are
initially presented in a narrow format with a record for every
every case or control observation. The conditional logistic
regression is fit to the narrow data using PROC LOGISTIC.
Subsequently, the data are reshaped into a wide form and the
wide form data are passed to the NLMIXED procedure. You can
compare the point estimates and standard errors, as well a
the model fit statistics which are produced by the NLMIXED
procedure against the same statistics generated by the LOGISTIC
procedure. We do get the same results. (Oh happy day!)

data Data1;
do ID=1 to 63;
do Outcome = 1 to 0 by -1;
count+1;
if count=1 then do;
stratum+1;
y = 1;
end;
else y = 0;
input Gall Hyper @@;
output;
if count=5 then count=0;
end;
end;
datalines;
0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1
0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1
0 1 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0
0 0 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0
0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 1 1
0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 0
0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0
0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1
0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 0 0
0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0
1 0 1 0 0 1 0 0 1 0 0 0
;

proc logistic data=Data1;
strata stratum;
model y(event='1')=Gall Hyper;
run;

data data2;
set data1;
by stratum;
array y_ {5};
array Gall_ {5};
array Hyper_ {5};
if first.stratum then do;
pointer=0;
do i=1 to 5;
y_{i} = .;
gall_{i} = .;
hyper_{i} = .;
end;
end;
pointer + 1;
y_{pointer} + y;
gall_{pointer} + gall;
hyper_{pointer} + hyper;
if last.stratum then output;
keep stratum y_: gall_: hyper_:;
run;

proc nlmixed data=data2;
parms b1 b2 0;
array Y_ {5};
array X1_ {5} gall_1-gall_5;
array X2_ {5} hyper_1-hyper_5;
do i=1 to 5;
if Y_{i}=1 then num = exp(b1*X1_{i} + b2*X2_{i});
end;
denom = 0;
do i=1 to 5;
if y_{i} in (0,1) & nmiss(X1_{i}, X2_{i})=0 then
denom = denom + exp(b1*X1_{i} + b2*X2_{i});
end;
if num>0 & denom>0 then ll = log(num / denom);
else ll = 0;

model ll ~ general(ll);
run;

Let me know if this does allow you to fit the 1:m matched design
in the large data set which you have. I would think that it
would, but am not certain as to whether the NLMIXED procedure
stores the entire data in memory or re-reads data as needed for
the iterative process. Storing the data in memory would improve
computational efficiency for an iterative process. However, for
extremely large data sets, you could run out of memory.

Note that it would be wise to pass only the variables which are
needed for the logistic regression. This would speed up data
throughput, and could also reduce the amount of memory required
to hold data in memory. Thus, it would be a good idea to use a
keep option to restrict the variables that are passed into the
NLMIXED procedure.

HTH,

Dale

---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra(a)NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------

--- On Wed, 2/3/10, Brian Sauer <brian.sauer(a)GMAIL.COM> wrote:

> From: Brian Sauer <brian.sauer(a)GMAIL.COM>
> Subject: Re: proc logistic: 'out of memory'
> To: SAS-L(a)LISTSERV.UGA.EDU
> Date: Wednesday, February 3, 2010, 9:02 AM
> Hi Dale,
> I am in a similar situation with Christine, but I have a
> 1:m matching
> problem. I am using a case-crossover design and the
> sas program I
> developed allows the user to select the number of control
> windows - up
> to 4. I didn't consider the limitations of
> conditional logistic when
> designing this program. This program is intended to
> be used an large
> healthcare databases and could easily have 100,000 cases or
> so. Proc
> logistic with a strata statement returns an out of memory
> warning. In
> your previous post you mentioned a NLMIXED solution.
> If you have
> worked this out would you please share it as this is beyond
> my skill
> set at this time.
> Thanks,
> Brian
> http://www.bmi.utah.edu/?module=facultyDetails&personId=8363&orgId=382
> On Jan 7, 12:31 pm, stringplaye...(a)YAHOO.COM
> (Dale McLerran) wrote:
> > Christine,
> >
> > Apparently, you have a case/control design since you
> are
> > using a STRATA statement. You also indicate that
> you have
> > 80,000 case records and 80,000 control records which
> would
> > suggest further that you might have a 1:1 matched
> study.
> > If so, then you can restructure your data so that you
> can
> > use a simple logistic regression. That should
> solve your
> > out-of-memory problem.
> >
> > So, if you have a 1:1 matched design, here is what you
> can
> > do. First, merge the matched case and control
> records
> > by stratum (subjid) renaming the exposure variable so
> that
> > you have a case exposure variable and a control
> exposure
> > variable. We want to compute the difference
> between the
> > two exposure variable values. At the same time,
> you need
> > to construct a new response variable which has value
> 1
> > for ALL records.
> >
> > With the restructured data, you can fit the
> conditional
> > logistic regression model for the 1:1 matched design
> without
> > need for the STRATA statement. You can fit
> the model
> > employing an ordinary logistic regression WITHOUT AN
> > INTERCEPT and using the difference of the exposure
> variables
> > as the predictor variable.
> >
> > Code for all of this (using the data set and variables
> shown
> > in your post) would be:
> >
> > proc sort data=outf.tendon_short
> out=tendon_short;
> > by subjid;
> > run;
> >
> > data matched_logistic_reg;
> > merge
> tendon_short(where=(case_flag=1)
> >
>
> rename=(exposure=exposure_case))
> >
> tendon_short(where=(case_flag^=1)
> >
>
> rename=(exposure=exposure_control));
> > by subjid;
> > exposure_diff = exposure_case
> - exposure_control;
> > response = 1;
> > run;
> >
> > proc logistic
> data=matched_logistic_reg;
> > model response = exposure_diff
> / noint;
> > run;
> >
> > This approach is described by Hosmer and Lemeshow in
> a
> > chapter on matched studies in their book "Applied
> Logistic
> > Regression". Now, if you have M:N matching, it
> will be
> > another whole kettle of fish. But let's start
> out with
> > the simple assumption first because I suspect that it
> will
> > meet your need.
> >
> > By the way, if you do have M:N matching so that the
> above
> > solution will not work for you, then post back to the
> list
> > specifying the maximum values of M and N across all
> strata.
> > We should be able to write code for fitting a
> conditional
> > logistic regression using the procedure NLMIXED.
> But we
> > would again need to restructure the data to have all
> > of the case and control records which are in a stratum
> on
> > a single record. The NLMIXED procedure would
> require a
> > fair bit of programming to construct the likelihood.
> > I would rather not go there unless it is necessary.
> >
> > Dale
> >
> > ---------------------------------------
> > Dale McLerran
> > Fred Hutchinson Cancer Research Center
> > mailto: dmclerra(a)NO_SPAMfhcrc.org
> > Ph: (206) 667-2926
> > Fax: (206) 667-5977
> > ---------------------------------------
> >
> > --- On Thu, 1/7/10, Christine Peloquin <christinepeloqu...(a)GMAIL.COM>
> wrote:
> >
> >
> >
> > > From: Christine Peloquin <christinepeloqu...(a)GMAIL.COM>
> > > Subject: proc logistic: 'out of memory'
> > > To: SA...(a)LISTSERV.UGA.EDU
> > > Date: Thursday, January 7, 2010, 7:01 AM
> > > hello.
> >
> > > i just started a job at BU. i am running proc
> logistic on a
> > > dataset with
> > > 160,000 observations (80,000 cases and 80,000
> controls) -
> > > and am receiving
> > > an 'out of memory' message. here is the
> code that i
> > > am running:
> >
> > > proc logistic data=outf.tendon_short;
> > > class exposure (ref='0') / param=ref;
> > > strata subjid;
> > > model case_flag (event='1') = exposure;
> > > run;
> >
> > > both the case_flag and exposure variables are
> dichotomous
> > > (numeric
> > > variables; values: 0/1). the subjid is a
> 11-char
> > > variable.
> >
> > > would anyone have a suggestion of how i could
> resolve this
> > > or what i should
> > > be looking at to further debug?
> >
> > > endless thanks.
> > > christine
>

From: Oliver Kuss on 5 Feb 2010 02:50

On 4 Feb., 20:40, stringplaye...(a)YAHOO.COM (Dale McLerran) wrote:
> Brian,
>
> The 1:m matched design is quite easy to implement in NLMIXED.
> Note that the data need to be structured with one record for
> each stratum. The record must have m+1 variables representing
> the case/control status and also m+1 variables representing
> each of the predictor variables. In the code below, I assume
> that the m+1 response variables are named Y_1-Y_5. Similarly,
> I assume that there are two predictor variables (X1 and X2)
> which are represented in wide form as X1_1-X1_5 and X2_1-X2_5.
> Thus, the data set would appear as follows:
>
> stratum Y_1 Y_2 ... Y_5 X1_1 X1_2 ... X1_5 X2_1 X2_2 ... X2_5
> 1 1 0 0 36 43 39 97 78 102
> 2 1 0 0 39 38 44 92 81 78
> ...
>
> Now, for a 1:m design, the conditional likelihood is
>
> l = exp(x{case}*beta) /
> sum from i=1 to m+1 { exp(x{i}*beta) }
>
> See Hosmer and Lemeshow, Applied Logistic Regression, for a
> more detailed description of the conditional likelihood for a
> case/control matched design.
>
> With data constructed as shown above, then we could fit the
> conditional logistic regression model for a 1:m (max(m)=4)
> with the following code:
>
> proc nlmixed data=mydata;
> parms b_x1 b_x2 0;
> array Y_ {5};
> array X1_ {5};
> array X2_ {5};
> do i=1 to 5;
> if Y_{i}=1 then num = exp(b_x1*X1_{i} + b_x2*X2_{i});
> end;
> denom = 0;
> do i=1 to 5;
> if y_{i} in (0,1) & nmiss(X1_{i}, X2_{i})=0 then
> denom = denom + exp(b_x1*X1_{i} + b_x2*X2_{i});
> end;
> if num>0 & denom>0 then ll = log(num / denom);
> else ll = 0;
>
> model ll ~ general(ll);
> run;
>
> Here is an example which constructs a 1:m design with m=4 for
> all but the last stratum. In the last stratum, m=2. Data are
> initially presented in a narrow format with a record for every
> every case or control observation. The conditional logistic
> regression is fit to the narrow data using PROC LOGISTIC.
> Subsequently, the data are reshaped into a wide form and the
> wide form data are passed to the NLMIXED procedure. You can
> compare the point estimates and standard errors, as well a
> the model fit statistics which are produced by the NLMIXED
> procedure against the same statistics generated by the LOGISTIC
> procedure. We do get the same results. (Oh happy day!)
>
> data Data1;
> do ID=1 to 63;
> do Outcome = 1 to 0 by -1;
> count+1;
> if count=1 then do;
> stratum+1;
> y = 1;
> end;
> else y = 0;
> input Gall Hyper @@;
> output;
> if count=5 then count=0;
> end;
> end;
> datalines;
> 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1
> 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0
> 1 0 0 0 0 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1
> 0 1 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0
> 0 0 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0
> 0 0 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0
> 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 1 1
> 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 0
> 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0
> 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1
> 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 0 0
> 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0
> 1 0 1 0 0 1 0 0 1 0 0 0
> ;
>
> proc logistic data=Data1;
> strata stratum;
> model y(event='1')=Gall Hyper;
> run;
>
> data data2;
> set data1;
> by stratum;
> array y_ {5};
> array Gall_ {5};
> array Hyper_ {5};
> if first.stratum then do;
> pointer=0;
> do i=1 to 5;
> y_{i} = .;
> gall_{i} = .;
> hyper_{i} = .;
> end;
> end;
> pointer + 1;
> y_{pointer} + y;
> gall_{pointer} + gall;
> hyper_{pointer} + hyper;
> if last.stratum then output;
> keep stratum y_: gall_: hyper_:;
> run;
>
> proc nlmixed data=data2;
> parms b1 b2 0;
> array Y_ {5};
> array X1_ {5} gall_1-gall_5;
> array X2_ {5} hyper_1-hyper_5;
> do i=1 to 5;
> if Y_{i}=1 then num = exp(b1*X1_{i} + b2*X2_{i});
> end;
> denom = 0;
> do i=1 to 5;
> if y_{i} in (0,1) & nmiss(X1_{i}, X2_{i})=0 then
> denom = denom + exp(b1*X1_{i} + b2*X2_{i});
> end;
> if num>0 & denom>0 then ll = log(num / denom);
> else ll = 0;
>
> model ll ~ general(ll);
> run;
>
> Let me know if this does allow you to fit the 1:m matched design
> in the large data set which you have. I would think that it
> would, but am not certain as to whether the NLMIXED procedure
> stores the entire data in memory or re-reads data as needed for
> the iterative process. Storing the data in memory would improve
> computational efficiency for an iterative process. However, for
> extremely large data sets, you could run out of memory.
>
> Note that it would be wise to pass only the variables which are
> needed for the logistic regression. This would speed up data
> throughput, and could also reduce the amount of memory required
> to hold data in memory. Thus, it would be a good idea to use a
> keep option to restrict the variables that are passed into the
> NLMIXED procedure.
>
> HTH,
>
> Dale
>
> ---------------------------------------
> Dale McLerran
> Fred Hutchinson Cancer Research Center
> mailto: dmclerra(a)NO_SPAMfhcrc.org
> Ph: (206) 667-2926
> Fax: (206) 667-5977
> ---------------------------------------
>
> --- On Wed, 2/3/10, Brian Sauer <brian.sa...(a)GMAIL.COM> wrote:
>
>
>
> > From: Brian Sauer <brian.sa...(a)GMAIL.COM>
> > Subject: Re: proc logistic: 'out of memory'
> > To: SA...(a)LISTSERV.UGA.EDU
> > Date: Wednesday, February 3, 2010, 9:02 AM
> > Hi Dale,
> > I am in a similar situation with Christine, but I have a
> > 1:m matching
> > problem. I am using a case-crossover design and the
> > sas program I
> > developed allows the user to select the number of control
> > windows - up
> > to 4. I didn't consider the limitations of
> > conditional logistic when
> > designing this program. This program is intended to
> > be used an large
> > healthcare databases and could easily have 100,000 cases or
> > so. Proc
> > logistic with a strata statement returns an out of memory
> > warning. In
> > your previous post you mentioned a NLMIXED solution.
> > If you have
> > worked this out would you please share it as this is beyond
> > my skill
> > set at this time.
> > Thanks,
> > Brian
> >http://www.bmi.utah.edu/?module=facultyDetails&personId=8363&orgId=382
> > On Jan 7, 12:31 pm, stringplaye...(a)YAHOO.COM
> > (Dale McLerran) wrote:
> > > Christine,
>
> > > Apparently, you have a case/control design since you
> > are
> > > using a STRATA statement. You also indicate that
> > you have
> > > 80,000 case records and 80,000 control records which
> > would
> > > suggest further that you might have a 1:1 matched
> > study.
> > > If so, then you can restructure your data so that you
> > can
> > > use a simple logistic regression. That should
> > solve your
> > > out-of-memory problem.
>
> > > So, if you have a 1:1 matched design, here is what you
> > can
> > > do. First, merge the matched case and control
> > records
> > > by stratum (subjid) renaming the exposure variable so
> > that
> > > you have a case exposure variable and a control
> > exposure
> > > variable. We want to compute the difference
> > between the
> > > two exposure variable values. At the same time,
> > you need
> > > to construct a new response variable which has value
> > 1
> > > for ALL records.
>
> > > With the restructured data, you can fit the
> > conditional
> > > logistic regression model for the 1:1 matched design
> > without
> > > need for the STRATA statement. You can fit
> > the model
> > > employing an ordinary logistic regression WITHOUT AN
> > > INTERCEPT and using the difference of the exposure
> > variables
> > > as the predictor variable.
>
> > > Code for all of this (using the data set and variables
> > shown
> > > in your post) would be:
>
> > > proc sort data=outf.tendon_short
> > out=tendon_short;
> > > by subjid;
> > > run;
>
> > > data matched_logistic_reg;
> > > merge
> > tendon_short(where=(case_flag=1)
>
> > rename=(exposure=exposure_case))
>
> > tendon_short(where=(case_flag^=1)
>
> > rename=(exposure=exposure_control));
> > > by subjid;
> > > exposure_diff = exposure_case
> > - exposure_control;
> > > response = 1;
> > > run;
>
> > > proc logistic
> > data=matched_logistic_reg;
> > > model response = exposure_diff
> > / noint;
> > > run;
>
> > > This approach is described by Hosmer and Lemeshow in
> > a
> > > chapter on matched studies in their book "Applied
> > Logistic
> > > Regression". Now, if you have M:N matching, it
> > will be
> > > another whole kettle of fish. But let's start
> > out with
> > > the simple assumption first because I suspect that it
> > will
> > > meet your need.
>
> > > By the way, if you do have M:N matching so that the
> > above
> > > solution will not work for you, then post back to the
> > list
> > > specifying the maximum values of M and N across all
> > strata.
> > > We should be able to write code for fitting a
> > conditional
> > > logistic regression using the procedure NLMIXED.
> > But we
> > > would again need to restructure the data to have all
> > > of the case and control records which are in a stratum
> > on
> > > a single record. The NLMIXED procedure would
> > require a
> > > fair bit of programming to construct the likelihood.
> > > I would rather not go there unless it is necessary.
>
> > > Dale
>
> > > ---------------------------------------
> > > Dale McLerran
> > > Fred Hutchinson Cancer Research Center
> > > mailto: dmclerra(a)NO_SPAMfhcrc.org
> > > Ph: (206) 667-2926
> > > Fax: (206) 667-5977
> > > ---------------------------------------
>
> > > --- On Thu, 1/7/10, Christine Peloquin <christinepeloqu...(a)GMAIL.COM>
> > wrote:
>
> > > > From: Christine Peloquin <christinepeloqu...(a)GMAIL.COM>
> > > > Subject: proc logistic: 'out of memory'
> > > > To: SA...(a)LISTSERV.UGA.EDU
> > > > Date: Thursday, January 7, 2010, 7:01 AM
> > > > hello.
>
> > > > i just started a job at BU. i am running proc
> > logistic on a
> > > > dataset with
> > > > 160,000 observations (80,000 cases and 80,000
> > controls) -
> > > > and am receiving
> > > > an 'out of memory' message. here is the
> > code that i
> > > > am running:
>
> > > > proc logistic data=outf.tendon_short;
> > > > class exposure (ref='0') / param=ref;
>
> ...
>
> Erfahren Sie mehr »- Zitierten Text ausblenden -
>
> - Zitierten Text anzeigen -

Dale,
thank you for sharing another excellent piece of NLMIXED code with us.
Maybe someone is interested in another piece of code for analysing 1:m-
matched data. Before the days of PROC NLMIXED and the STRATA statement
in PROC LOGISTIC, PROC PHREG was first choice for these models (check
example 5 in the PHREG documentation). As such the following code
PHREG code works also with Dale's data example. Please note that the
definition of the variable STATUS is added to the first data step.

Hope that helps,
Oliver

data Data1;
do ID=1 to 63;
do Outcome = 1 to 0 by -1;
count+1;
if count=1 then do;
stratum+1;
y = 1;
end;
else y = 0;
input Gall Hyper @@;
status=2-y;
output;
if count=5 then count=0;
end;
end;

datalines;
0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1
0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1
0 1 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0
0 0 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0
0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 1 1
0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 0
0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0
0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1
0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 0 0
0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0
1 0 1 0 0 1 0 0 1 0 0 0
;

proc phreg data=Data1;
model status*y(0)=Gall Hyper / ties=discrete;
strata stratum;
run;

From: Dale McLerran on 5 Feb 2010 12:31

> >
> > Erfahren Sie mehr �- Zitierten Text ausblenden -
> >
> > - Zitierten Text anzeigen -
>
> Dale,
> thank you for sharing another excellent piece of NLMIXED code with us.
> Maybe someone is interested in another piece of code for analysing 1:m-
> matched data. Before the days of PROC NLMIXED and the STRATA statement
> in PROC LOGISTIC, PROC PHREG was first choice for these models (check
> example 5 in the PHREG documentation). As such the following code
> PHREG code works also with Dale's data example. Please note that the
> definition of the variable STATUS is added to the first data step.
>
> Hope that helps,
> Oliver
>
> data Data1;
> do ID=1 to 63;
> do Outcome = 1 to 0 by -1;
> count+1;
> if count=1 then do;
> stratum+1;
> y = 1;
> end;
> else y = 0;
> input Gall Hyper @@;
> status=2-y;
> output;
> if count=5 then count=0;
> end;
> end;
>
> datalines;
> 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1
> 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0
> 1 0 0 0 0 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1
> 0 1 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0
> 0 0 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0
> 0 0 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0
> 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 1 1
> 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 0
> 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0
> 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1
> 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 0 0
> 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0
> 1 0 1 0 0 1 0 0 1 0 0 0
> ;
>
> proc phreg data=Data1;
> model status*y(0)=Gall Hyper / ties=discrete;
> strata stratum;
> run;
>

Oliver,

Thanks for posting this. Yes, the PHREG procedure was for a
long time the only procedure in SAS advertised as suitable for
fitting the 1:m case/control model. It certainly is an alternative
which should be considered for the 1:m matched design. My guess
is that it would be at least as memory intensive as the LOGISTIC
procedure for the 1:m matched design, but I certainly do not
know that with any authority. I also can't state with certainty
that the NLMIXED procedure would require less memory than the
LOGISTIC or PHREG procedures - although if NLMIXED re-reads data,
then examination of the likelihood model would indicate that the
NLMIXED prodedure would not need much memory. But if the data
are held in memory for the duration of execution of the NLMIXED
procedure, then one could encounter an out-of-memory issue with
extremely large data sets.

Dale

| Next | Last
Pages: 1 2 3
Prev: create a file using filename FTP method
Next: SAS EG Graph problem??