From: Dale McLerran on 7 Jan 2010 14:31 Christine, Apparently, you have a case/control design since you are using a STRATA statement. You also indicate that you have 80,000 case records and 80,000 control records which would suggest further that you might have a 1:1 matched study. If so, then you can restructure your data so that you can use a simple logistic regression. That should solve your out-of-memory problem. So, if you have a 1:1 matched design, here is what you can do. First, merge the matched case and control records by stratum (subjid) renaming the exposure variable so that you have a case exposure variable and a control exposure variable. We want to compute the difference between the two exposure variable values. At the same time, you need to construct a new response variable which has value 1 for ALL records. With the restructured data, you can fit the conditional logistic regression model for the 1:1 matched design without need for the STRATA statement. You can fit the model employing an ordinary logistic regression WITHOUT AN INTERCEPT and using the difference of the exposure variables as the predictor variable. Code for all of this (using the data set and variables shown in your post) would be: proc sort data=outf.tendon_short out=tendon_short; by subjid; run; data matched_logistic_reg; merge tendon_short(where=(case_flag=1) rename=(exposure=exposure_case)) tendon_short(where=(case_flag^=1) rename=(exposure=exposure_control)); by subjid; exposure_diff = exposure_case - exposure_control; response = 1; run; proc logistic data=matched_logistic_reg; model response = exposure_diff / noint; run; This approach is described by Hosmer and Lemeshow in a chapter on matched studies in their book "Applied Logistic Regression". Now, if you have M:N matching, it will be another whole kettle of fish. But let's start out with the simple assumption first because I suspect that it will meet your need. By the way, if you do have M:N matching so that the above solution will not work for you, then post back to the list specifying the maximum values of M and N across all strata. We should be able to write code for fitting a conditional logistic regression using the procedure NLMIXED. But we would again need to restructure the data to have all of the case and control records which are in a stratum on a single record. The NLMIXED procedure would require a fair bit of programming to construct the likelihood. I would rather not go there unless it is necessary. Dale --------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra(a)NO_SPAMfhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 --------------------------------------- --- On Thu, 1/7/10, Christine Peloquin <christinepeloquin1(a)GMAIL.COM> wrote: > From: Christine Peloquin <christinepeloquin1(a)GMAIL.COM> > Subject: proc logistic: 'out of memory' > To: SAS-L(a)LISTSERV.UGA.EDU > Date: Thursday, January 7, 2010, 7:01 AM > hello. > > i just started a job at BU. i am running proc logistic on a > dataset with > 160,000 observations (80,000 cases and 80,000 controls) - > and am receiving > an 'out of memory' message. here is the code that i > am running: > > proc logistic data=outf.tendon_short; > class exposure (ref='0') / param=ref; > strata subjid; > model case_flag (event='1') = exposure; > run; > > both the case_flag and exposure variables are dichotomous > (numeric > variables; values: 0/1). the subjid is a 11-char > variable. > > would anyone have a suggestion of how i could resolve this > or what i should > be looking at to further debug? > > endless thanks. > christine >
From: Brian Sauer on 3 Feb 2010 12:02 Hi Dale, I am in a similar situation with Christine, but I have a 1:m matching problem. I am using a case-crossover design and the sas program I developed allows the user to select the number of control windows - up to 4. I didn't consider the limitations of conditional logistic when designing this program. This program is intended to be used an large healthcare databases and could easily have 100,000 cases or so. Proc logistic with a strata statement returns an out of memory warning. In your previous post you mentioned a NLMIXED solution. If you have worked this out would you please share it as this is beyond my skill set at this time. Thanks, Brian http://www.bmi.utah.edu/?module=facultyDetails&personId=8363&orgId=382 On Jan 7, 12:31 pm, stringplaye...(a)YAHOO.COM (Dale McLerran) wrote: > Christine, > > Apparently, you have a case/control design since you are > using a STRATA statement. You also indicate that you have > 80,000 case records and 80,000 control records which would > suggest further that you might have a 1:1 matched study. > If so, then you can restructure your data so that you can > use a simple logistic regression. That should solve your > out-of-memory problem. > > So, if you have a 1:1 matched design, here is what you can > do. First, merge the matched case and control records > by stratum (subjid) renaming the exposure variable so that > you have a case exposure variable and a control exposure > variable. We want to compute the difference between the > two exposure variable values. At the same time, you need > to construct a new response variable which has value 1 > for ALL records. > > With the restructured data, you can fit the conditional > logistic regression model for the 1:1 matched design without > need for the STRATA statement. You can fit the model > employing an ordinary logistic regression WITHOUT AN > INTERCEPT and using the difference of the exposure variables > as the predictor variable. > > Code for all of this (using the data set and variables shown > in your post) would be: > > proc sort data=outf.tendon_short out=tendon_short; > by subjid; > run; > > data matched_logistic_reg; > merge tendon_short(where=(case_flag=1) > rename=(exposure=exposure_case)) > tendon_short(where=(case_flag^=1) > rename=(exposure=exposure_control)); > by subjid; > exposure_diff = exposure_case - exposure_control; > response = 1; > run; > > proc logistic data=matched_logistic_reg; > model response = exposure_diff / noint; > run; > > This approach is described by Hosmer and Lemeshow in a > chapter on matched studies in their book "Applied Logistic > Regression". Now, if you have M:N matching, it will be > another whole kettle of fish. But let's start out with > the simple assumption first because I suspect that it will > meet your need. > > By the way, if you do have M:N matching so that the above > solution will not work for you, then post back to the list > specifying the maximum values of M and N across all strata. > We should be able to write code for fitting a conditional > logistic regression using the procedure NLMIXED. But we > would again need to restructure the data to have all > of the case and control records which are in a stratum on > a single record. The NLMIXED procedure would require a > fair bit of programming to construct the likelihood. > I would rather not go there unless it is necessary. > > Dale > > --------------------------------------- > Dale McLerran > Fred Hutchinson Cancer Research Center > mailto: dmclerra(a)NO_SPAMfhcrc.org > Ph: (206) 667-2926 > Fax: (206) 667-5977 > --------------------------------------- > > --- On Thu, 1/7/10, Christine Peloquin <christinepeloqu...(a)GMAIL.COM> wrote: > > > > > From: Christine Peloquin <christinepeloqu...(a)GMAIL.COM> > > Subject: proc logistic: 'out of memory' > > To: SA...(a)LISTSERV.UGA.EDU > > Date: Thursday, January 7, 2010, 7:01 AM > > hello. > > > i just started a job at BU. i am running proc logistic on a > > dataset with > > 160,000 observations (80,000 cases and 80,000 controls) - > > and am receiving > > an 'out of memory' message. here is the code that i > > am running: > > > proc logistic data=outf.tendon_short; > > class exposure (ref='0') / param=ref; > > strata subjid; > > model case_flag (event='1') = exposure; > > run; > > > both the case_flag and exposure variables are dichotomous > > (numeric > > variables; values: 0/1). the subjid is a 11-char > > variable. > > > would anyone have a suggestion of how i could resolve this > > or what i should > > be looking at to further debug? > > > endless thanks. > > christine
From: Dale McLerran on 4 Feb 2010 14:40 Brian, The 1:m matched design is quite easy to implement in NLMIXED. Note that the data need to be structured with one record for each stratum. The record must have m+1 variables representing the case/control status and also m+1 variables representing each of the predictor variables. In the code below, I assume that the m+1 response variables are named Y_1-Y_5. Similarly, I assume that there are two predictor variables (X1 and X2) which are represented in wide form as X1_1-X1_5 and X2_1-X2_5. Thus, the data set would appear as follows: stratum Y_1 Y_2 ... Y_5 X1_1 X1_2 ... X1_5 X2_1 X2_2 ... X2_5 1 1 0 0 36 43 39 97 78 102 2 1 0 0 39 38 44 92 81 78 ... Now, for a 1:m design, the conditional likelihood is l = exp(x{case}*beta) / sum from i=1 to m+1 { exp(x{i}*beta) } See Hosmer and Lemeshow, Applied Logistic Regression, for a more detailed description of the conditional likelihood for a case/control matched design. With data constructed as shown above, then we could fit the conditional logistic regression model for a 1:m (max(m)=4) with the following code: proc nlmixed data=mydata; parms b_x1 b_x2 0; array Y_ {5}; array X1_ {5}; array X2_ {5}; do i=1 to 5; if Y_{i}=1 then num = exp(b_x1*X1_{i} + b_x2*X2_{i}); end; denom = 0; do i=1 to 5; if y_{i} in (0,1) & nmiss(X1_{i}, X2_{i})=0 then denom = denom + exp(b_x1*X1_{i} + b_x2*X2_{i}); end; if num>0 & denom>0 then ll = log(num / denom); else ll = 0; model ll ~ general(ll); run; Here is an example which constructs a 1:m design with m=4 for all but the last stratum. In the last stratum, m=2. Data are initially presented in a narrow format with a record for every every case or control observation. The conditional logistic regression is fit to the narrow data using PROC LOGISTIC. Subsequently, the data are reshaped into a wide form and the wide form data are passed to the NLMIXED procedure. You can compare the point estimates and standard errors, as well a the model fit statistics which are produced by the NLMIXED procedure against the same statistics generated by the LOGISTIC procedure. We do get the same results. (Oh happy day!) data Data1; do ID=1 to 63; do Outcome = 1 to 0 by -1; count+1; if count=1 then do; stratum+1; y = 1; end; else y = 0; input Gall Hyper @@; output; if count=5 then count=0; end; end; datalines; 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1 0 1 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 0 0 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 0 1 0 0 0 ; proc logistic data=Data1; strata stratum; model y(event='1')=Gall Hyper; run; data data2; set data1; by stratum; array y_ {5}; array Gall_ {5}; array Hyper_ {5}; if first.stratum then do; pointer=0; do i=1 to 5; y_{i} = .; gall_{i} = .; hyper_{i} = .; end; end; pointer + 1; y_{pointer} + y; gall_{pointer} + gall; hyper_{pointer} + hyper; if last.stratum then output; keep stratum y_: gall_: hyper_:; run; proc nlmixed data=data2; parms b1 b2 0; array Y_ {5}; array X1_ {5} gall_1-gall_5; array X2_ {5} hyper_1-hyper_5; do i=1 to 5; if Y_{i}=1 then num = exp(b1*X1_{i} + b2*X2_{i}); end; denom = 0; do i=1 to 5; if y_{i} in (0,1) & nmiss(X1_{i}, X2_{i})=0 then denom = denom + exp(b1*X1_{i} + b2*X2_{i}); end; if num>0 & denom>0 then ll = log(num / denom); else ll = 0; model ll ~ general(ll); run; Let me know if this does allow you to fit the 1:m matched design in the large data set which you have. I would think that it would, but am not certain as to whether the NLMIXED procedure stores the entire data in memory or re-reads data as needed for the iterative process. Storing the data in memory would improve computational efficiency for an iterative process. However, for extremely large data sets, you could run out of memory. Note that it would be wise to pass only the variables which are needed for the logistic regression. This would speed up data throughput, and could also reduce the amount of memory required to hold data in memory. Thus, it would be a good idea to use a keep option to restrict the variables that are passed into the NLMIXED procedure. HTH, Dale --------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra(a)NO_SPAMfhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 --------------------------------------- --- On Wed, 2/3/10, Brian Sauer <brian.sauer(a)GMAIL.COM> wrote: > From: Brian Sauer <brian.sauer(a)GMAIL.COM> > Subject: Re: proc logistic: 'out of memory' > To: SAS-L(a)LISTSERV.UGA.EDU > Date: Wednesday, February 3, 2010, 9:02 AM > Hi Dale, > I am in a similar situation with Christine, but I have a > 1:m matching > problem. I am using a case-crossover design and the > sas program I > developed allows the user to select the number of control > windows - up > to 4. I didn't consider the limitations of > conditional logistic when > designing this program. This program is intended to > be used an large > healthcare databases and could easily have 100,000 cases or > so. Proc > logistic with a strata statement returns an out of memory > warning. In > your previous post you mentioned a NLMIXED solution. > If you have > worked this out would you please share it as this is beyond > my skill > set at this time. > Thanks, > Brian > http://www.bmi.utah.edu/?module=facultyDetails&personId=8363&orgId=382 > On Jan 7, 12:31 pm, stringplaye...(a)YAHOO.COM > (Dale McLerran) wrote: > > Christine, > > > > Apparently, you have a case/control design since you > are > > using a STRATA statement. You also indicate that > you have > > 80,000 case records and 80,000 control records which > would > > suggest further that you might have a 1:1 matched > study. > > If so, then you can restructure your data so that you > can > > use a simple logistic regression. That should > solve your > > out-of-memory problem. > > > > So, if you have a 1:1 matched design, here is what you > can > > do. First, merge the matched case and control > records > > by stratum (subjid) renaming the exposure variable so > that > > you have a case exposure variable and a control > exposure > > variable. We want to compute the difference > between the > > two exposure variable values. At the same time, > you need > > to construct a new response variable which has value > 1 > > for ALL records. > > > > With the restructured data, you can fit the > conditional > > logistic regression model for the 1:1 matched design > without > > need for the STRATA statement. You can fit > the model > > employing an ordinary logistic regression WITHOUT AN > > INTERCEPT and using the difference of the exposure > variables > > as the predictor variable. > > > > Code for all of this (using the data set and variables > shown > > in your post) would be: > > > > proc sort data=outf.tendon_short > out=tendon_short; > > by subjid; > > run; > > > > data matched_logistic_reg; > > merge > tendon_short(where=(case_flag=1) > > > > rename=(exposure=exposure_case)) > > > tendon_short(where=(case_flag^=1) > > > > rename=(exposure=exposure_control)); > > by subjid; > > exposure_diff = exposure_case > - exposure_control; > > response = 1; > > run; > > > > proc logistic > data=matched_logistic_reg; > > model response = exposure_diff > / noint; > > run; > > > > This approach is described by Hosmer and Lemeshow in > a > > chapter on matched studies in their book "Applied > Logistic > > Regression". Now, if you have M:N matching, it > will be > > another whole kettle of fish. But let's start > out with > > the simple assumption first because I suspect that it > will > > meet your need. > > > > By the way, if you do have M:N matching so that the > above > > solution will not work for you, then post back to the > list > > specifying the maximum values of M and N across all > strata. > > We should be able to write code for fitting a > conditional > > logistic regression using the procedure NLMIXED. > But we > > would again need to restructure the data to have all > > of the case and control records which are in a stratum > on > > a single record. The NLMIXED procedure would > require a > > fair bit of programming to construct the likelihood. > > I would rather not go there unless it is necessary. > > > > Dale > > > > --------------------------------------- > > Dale McLerran > > Fred Hutchinson Cancer Research Center > > mailto: dmclerra(a)NO_SPAMfhcrc.org > > Ph: (206) 667-2926 > > Fax: (206) 667-5977 > > --------------------------------------- > > > > --- On Thu, 1/7/10, Christine Peloquin <christinepeloqu...(a)GMAIL.COM> > wrote: > > > > > > > > > From: Christine Peloquin <christinepeloqu...(a)GMAIL.COM> > > > Subject: proc logistic: 'out of memory' > > > To: SA...(a)LISTSERV.UGA.EDU > > > Date: Thursday, January 7, 2010, 7:01 AM > > > hello. > > > > > i just started a job at BU. i am running proc > logistic on a > > > dataset with > > > 160,000 observations (80,000 cases and 80,000 > controls) - > > > and am receiving > > > an 'out of memory' message. here is the > code that i > > > am running: > > > > > proc logistic data=outf.tendon_short; > > > class exposure (ref='0') / param=ref; > > > strata subjid; > > > model case_flag (event='1') = exposure; > > > run; > > > > > both the case_flag and exposure variables are > dichotomous > > > (numeric > > > variables; values: 0/1). the subjid is a > 11-char > > > variable. > > > > > would anyone have a suggestion of how i could > resolve this > > > or what i should > > > be looking at to further debug? > > > > > endless thanks. > > > christine >
From: Oliver Kuss on 5 Feb 2010 02:50 On 4 Feb., 20:40, stringplaye...(a)YAHOO.COM (Dale McLerran) wrote: > Brian, > > The 1:m matched design is quite easy to implement in NLMIXED. > Note that the data need to be structured with one record for > each stratum. The record must have m+1 variables representing > the case/control status and also m+1 variables representing > each of the predictor variables. In the code below, I assume > that the m+1 response variables are named Y_1-Y_5. Similarly, > I assume that there are two predictor variables (X1 and X2) > which are represented in wide form as X1_1-X1_5 and X2_1-X2_5. > Thus, the data set would appear as follows: > > stratum Y_1 Y_2 ... Y_5 X1_1 X1_2 ... X1_5 X2_1 X2_2 ... X2_5 > 1 1 0 0 36 43 39 97 78 102 > 2 1 0 0 39 38 44 92 81 78 > ... > > Now, for a 1:m design, the conditional likelihood is > > l = exp(x{case}*beta) / > sum from i=1 to m+1 { exp(x{i}*beta) } > > See Hosmer and Lemeshow, Applied Logistic Regression, for a > more detailed description of the conditional likelihood for a > case/control matched design. > > With data constructed as shown above, then we could fit the > conditional logistic regression model for a 1:m (max(m)=4) > with the following code: > > proc nlmixed data=mydata; > parms b_x1 b_x2 0; > array Y_ {5}; > array X1_ {5}; > array X2_ {5}; > do i=1 to 5; > if Y_{i}=1 then num = exp(b_x1*X1_{i} + b_x2*X2_{i}); > end; > denom = 0; > do i=1 to 5; > if y_{i} in (0,1) & nmiss(X1_{i}, X2_{i})=0 then > denom = denom + exp(b_x1*X1_{i} + b_x2*X2_{i}); > end; > if num>0 & denom>0 then ll = log(num / denom); > else ll = 0; > > model ll ~ general(ll); > run; > > Here is an example which constructs a 1:m design with m=4 for > all but the last stratum. In the last stratum, m=2. Data are > initially presented in a narrow format with a record for every > every case or control observation. The conditional logistic > regression is fit to the narrow data using PROC LOGISTIC. > Subsequently, the data are reshaped into a wide form and the > wide form data are passed to the NLMIXED procedure. You can > compare the point estimates and standard errors, as well a > the model fit statistics which are produced by the NLMIXED > procedure against the same statistics generated by the LOGISTIC > procedure. We do get the same results. (Oh happy day!) > > data Data1; > do ID=1 to 63; > do Outcome = 1 to 0 by -1; > count+1; > if count=1 then do; > stratum+1; > y = 1; > end; > else y = 0; > input Gall Hyper @@; > output; > if count=5 then count=0; > end; > end; > datalines; > 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 > 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 > 1 0 0 0 0 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1 > 0 1 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 > 0 0 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 > 0 0 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 > 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 1 1 > 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 0 > 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 > 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 > 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 0 0 > 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 > 1 0 1 0 0 1 0 0 1 0 0 0 > ; > > proc logistic data=Data1; > strata stratum; > model y(event='1')=Gall Hyper; > run; > > data data2; > set data1; > by stratum; > array y_ {5}; > array Gall_ {5}; > array Hyper_ {5}; > if first.stratum then do; > pointer=0; > do i=1 to 5; > y_{i} = .; > gall_{i} = .; > hyper_{i} = .; > end; > end; > pointer + 1; > y_{pointer} + y; > gall_{pointer} + gall; > hyper_{pointer} + hyper; > if last.stratum then output; > keep stratum y_: gall_: hyper_:; > run; > > proc nlmixed data=data2; > parms b1 b2 0; > array Y_ {5}; > array X1_ {5} gall_1-gall_5; > array X2_ {5} hyper_1-hyper_5; > do i=1 to 5; > if Y_{i}=1 then num = exp(b1*X1_{i} + b2*X2_{i}); > end; > denom = 0; > do i=1 to 5; > if y_{i} in (0,1) & nmiss(X1_{i}, X2_{i})=0 then > denom = denom + exp(b1*X1_{i} + b2*X2_{i}); > end; > if num>0 & denom>0 then ll = log(num / denom); > else ll = 0; > > model ll ~ general(ll); > run; > > Let me know if this does allow you to fit the 1:m matched design > in the large data set which you have. I would think that it > would, but am not certain as to whether the NLMIXED procedure > stores the entire data in memory or re-reads data as needed for > the iterative process. Storing the data in memory would improve > computational efficiency for an iterative process. However, for > extremely large data sets, you could run out of memory. > > Note that it would be wise to pass only the variables which are > needed for the logistic regression. This would speed up data > throughput, and could also reduce the amount of memory required > to hold data in memory. Thus, it would be a good idea to use a > keep option to restrict the variables that are passed into the > NLMIXED procedure. > > HTH, > > Dale > > --------------------------------------- > Dale McLerran > Fred Hutchinson Cancer Research Center > mailto: dmclerra(a)NO_SPAMfhcrc.org > Ph: (206) 667-2926 > Fax: (206) 667-5977 > --------------------------------------- > > --- On Wed, 2/3/10, Brian Sauer <brian.sa...(a)GMAIL.COM> wrote: > > > > > From: Brian Sauer <brian.sa...(a)GMAIL.COM> > > Subject: Re: proc logistic: 'out of memory' > > To: SA...(a)LISTSERV.UGA.EDU > > Date: Wednesday, February 3, 2010, 9:02 AM > > Hi Dale, > > I am in a similar situation with Christine, but I have a > > 1:m matching > > problem. I am using a case-crossover design and the > > sas program I > > developed allows the user to select the number of control > > windows - up > > to 4. I didn't consider the limitations of > > conditional logistic when > > designing this program. This program is intended to > > be used an large > > healthcare databases and could easily have 100,000 cases or > > so. Proc > > logistic with a strata statement returns an out of memory > > warning. In > > your previous post you mentioned a NLMIXED solution. > > If you have > > worked this out would you please share it as this is beyond > > my skill > > set at this time. > > Thanks, > > Brian > >http://www.bmi.utah.edu/?module=facultyDetails&personId=8363&orgId=382 > > On Jan 7, 12:31 pm, stringplaye...(a)YAHOO.COM > > (Dale McLerran) wrote: > > > Christine, > > > > Apparently, you have a case/control design since you > > are > > > using a STRATA statement. You also indicate that > > you have > > > 80,000 case records and 80,000 control records which > > would > > > suggest further that you might have a 1:1 matched > > study. > > > If so, then you can restructure your data so that you > > can > > > use a simple logistic regression. That should > > solve your > > > out-of-memory problem. > > > > So, if you have a 1:1 matched design, here is what you > > can > > > do. First, merge the matched case and control > > records > > > by stratum (subjid) renaming the exposure variable so > > that > > > you have a case exposure variable and a control > > exposure > > > variable. We want to compute the difference > > between the > > > two exposure variable values. At the same time, > > you need > > > to construct a new response variable which has value > > 1 > > > for ALL records. > > > > With the restructured data, you can fit the > > conditional > > > logistic regression model for the 1:1 matched design > > without > > > need for the STRATA statement. You can fit > > the model > > > employing an ordinary logistic regression WITHOUT AN > > > INTERCEPT and using the difference of the exposure > > variables > > > as the predictor variable. > > > > Code for all of this (using the data set and variables > > shown > > > in your post) would be: > > > > proc sort data=outf.tendon_short > > out=tendon_short; > > > by subjid; > > > run; > > > > data matched_logistic_reg; > > > merge > > tendon_short(where=(case_flag=1) > > > rename=(exposure=exposure_case)) > > > tendon_short(where=(case_flag^=1) > > > rename=(exposure=exposure_control)); > > > by subjid; > > > exposure_diff = exposure_case > > - exposure_control; > > > response = 1; > > > run; > > > > proc logistic > > data=matched_logistic_reg; > > > model response = exposure_diff > > / noint; > > > run; > > > > This approach is described by Hosmer and Lemeshow in > > a > > > chapter on matched studies in their book "Applied > > Logistic > > > Regression". Now, if you have M:N matching, it > > will be > > > another whole kettle of fish. But let's start > > out with > > > the simple assumption first because I suspect that it > > will > > > meet your need. > > > > By the way, if you do have M:N matching so that the > > above > > > solution will not work for you, then post back to the > > list > > > specifying the maximum values of M and N across all > > strata. > > > We should be able to write code for fitting a > > conditional > > > logistic regression using the procedure NLMIXED. > > But we > > > would again need to restructure the data to have all > > > of the case and control records which are in a stratum > > on > > > a single record. The NLMIXED procedure would > > require a > > > fair bit of programming to construct the likelihood. > > > I would rather not go there unless it is necessary. > > > > Dale > > > > --------------------------------------- > > > Dale McLerran > > > Fred Hutchinson Cancer Research Center > > > mailto: dmclerra(a)NO_SPAMfhcrc.org > > > Ph: (206) 667-2926 > > > Fax: (206) 667-5977 > > > --------------------------------------- > > > > --- On Thu, 1/7/10, Christine Peloquin <christinepeloqu...(a)GMAIL.COM> > > wrote: > > > > > From: Christine Peloquin <christinepeloqu...(a)GMAIL.COM> > > > > Subject: proc logistic: 'out of memory' > > > > To: SA...(a)LISTSERV.UGA.EDU > > > > Date: Thursday, January 7, 2010, 7:01 AM > > > > hello. > > > > > i just started a job at BU. i am running proc > > logistic on a > > > > dataset with > > > > 160,000 observations (80,000 cases and 80,000 > > controls) - > > > > and am receiving > > > > an 'out of memory' message. here is the > > code that i > > > > am running: > > > > > proc logistic data=outf.tendon_short; > > > > class exposure (ref='0') / param=ref; > > ... > > Erfahren Sie mehr »- Zitierten Text ausblenden - > > - Zitierten Text anzeigen - Dale, thank you for sharing another excellent piece of NLMIXED code with us. Maybe someone is interested in another piece of code for analysing 1:m- matched data. Before the days of PROC NLMIXED and the STRATA statement in PROC LOGISTIC, PROC PHREG was first choice for these models (check example 5 in the PHREG documentation). As such the following code PHREG code works also with Dale's data example. Please note that the definition of the variable STATUS is added to the first data step. Hope that helps, Oliver data Data1; do ID=1 to 63; do Outcome = 1 to 0 by -1; count+1; if count=1 then do; stratum+1; y = 1; end; else y = 0; input Gall Hyper @@; status=2-y; output; if count=5 then count=0; end; end; datalines; 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1 0 1 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 0 0 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 0 1 0 0 0 ; proc phreg data=Data1; model status*y(0)=Gall Hyper / ties=discrete; strata stratum; run;
From: Dale McLerran on 5 Feb 2010 12:31 > > > > Erfahren Sie mehr �- Zitierten Text ausblenden - > > > > - Zitierten Text anzeigen - > > Dale, > thank you for sharing another excellent piece of NLMIXED code with us. > Maybe someone is interested in another piece of code for analysing 1:m- > matched data. Before the days of PROC NLMIXED and the STRATA statement > in PROC LOGISTIC, PROC PHREG was first choice for these models (check > example 5 in the PHREG documentation). As such the following code > PHREG code works also with Dale's data example. Please note that the > definition of the variable STATUS is added to the first data step. > > Hope that helps, > Oliver > > data Data1; > do ID=1 to 63; > do Outcome = 1 to 0 by -1; > count+1; > if count=1 then do; > stratum+1; > y = 1; > end; > else y = 0; > input Gall Hyper @@; > status=2-y; > output; > if count=5 then count=0; > end; > end; > > datalines; > 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 > 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 > 1 0 0 0 0 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1 > 0 1 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 > 0 0 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 > 0 0 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 > 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 1 1 > 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 0 > 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 > 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 > 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 0 0 > 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 > 1 0 1 0 0 1 0 0 1 0 0 0 > ; > > proc phreg data=Data1; > model status*y(0)=Gall Hyper / ties=discrete; > strata stratum; > run; > Oliver, Thanks for posting this. Yes, the PHREG procedure was for a long time the only procedure in SAS advertised as suitable for fitting the 1:m case/control model. It certainly is an alternative which should be considered for the 1:m matched design. My guess is that it would be at least as memory intensive as the LOGISTIC procedure for the 1:m matched design, but I certainly do not know that with any authority. I also can't state with certainty that the NLMIXED procedure would require less memory than the LOGISTIC or PHREG procedures - although if NLMIXED re-reads data, then examination of the likelihood model would indicate that the NLMIXED prodedure would not need much memory. But if the data are held in memory for the duration of execution of the NLMIXED procedure, then one could encounter an out-of-memory issue with extremely large data sets. Dale
|
Next
|
Last
Pages: 1 2 3 Prev: create a file using filename FTP method Next: SAS EG Graph problem?? |