Prev: calculate variance
Next: ODS Tagsets ExcelXP
From: "Nick ." on 10 May 2006 12:19 Hello, Can one of you, Jonas or Dave or Peter or someone explain what this code is doing??? I am trying to follow this thread and I am lost already. Having a sample dataset will help a lot!!! NICK ----- Original Message ----- From: "Jonas Bilenas" To: SAS-L(a)LISTSERV.UGA.EDU Subject: Re: jackknife concept Date: Wed, 10 May 2006 08:16:26 -0400 On Tue, 9 May 2006 17:49:53 -0400, Luo, Peter wrote: > David, for what Jonas was trying to do, i.e. to get some 'error' estimates > for model predictors, is N sub-samples or N bootstrapping samples the better > method? > I modified the code a bit, based on suggestions from David. Similar but different results: %macro boot(iter); proc surveyselect data=reg out=outdata rep=&ITER method=urs samprate=1 outhits; run; %do i=1 %to &iter; ods listing close; ods output ParameterEstimates=bout; proc logistic data=outdata; where replicate=&i; model bad=&ivs; run; ods output close; proc transpose data=bout out=bt&i; var estimate; id variable; run; %if "&i" ne "1" %then %do; proc append base=bt1 data=bt&i; run; %end; %end; ods listing; proc means data=bt1 mean min max std n nmiss; run; %mend; %boot(20); -- ___________________________________________________ Play 100s of games for FREE! http://games.mail.com/
From: NordlDJ on 10 May 2006 15:25 > -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of Jonas > Bilenas > Sent: Wednesday, May 10, 2006 5:16 AM > To: SAS-L(a)LISTSERV.UGA.EDU > Subject: Re: jackknife concept > > On Tue, 9 May 2006 17:49:53 -0400, Luo, Peter <pluo(a)DRAFTNET.COM> wrote: > > >David, for what Jonas was trying to do, i.e. to get some 'error' > estimates > >for model predictors, is N sub-samples or N bootstrapping samples the > better > >method? > > > I modified the code a bit, based on suggestions from David. Similar but > different results: > > %macro boot(iter); > > proc surveyselect data=reg out=outdata > rep=&ITER method=urs samprate=1 outhits; > run; > > %do i=1 %to &iter; > ods listing close; > ods output ParameterEstimates=bout; > proc logistic data=outdata; > where replicate=&i; > model bad=&ivs; > run; > ods output close; > > proc transpose data=bout out=bt&i; > var estimate; > id variable; > run; > %if "&i" ne "1" %then %do; > proc append base=bt1 data=bt&i; > run; > %end; > %end; > ods listing; > > > proc means data=bt1 mean min max std n nmiss; > run; > %mend; > > %boot(20); Jonas, you haven't incorporated one of the most important suggestions that David made, which is to use BY processing in Proc Logistic. That will eliminate having to continually open and close the file of bootstrap samples, and the file will only have to be read through once. Remove the %DO loop and replace the where statement with a BY statement. You can also eliminate the Proc Transpose and the Proc Append. Something like the following (I'm not sure where the macro variable &ivs is defined) : %macro boot(iter); proc surveyselect data=reg out=outdata rep=&ITER method=urs samprate=1 outhits; run; ods listing close; ods output ParameterEstimates=bout; proc logistic data=outdata; by replicate; model bad=&ivs; run; ods output close; ods listing; proc means data=bout mean min max std n nmiss; class variable; var estimate; output out=estimate_summary; run; %mend; %boot(20); Hope this is helpful, Dan Daniel J. Nordlund Research and Data Analysis Washington State Department of Social and Health Services Olympia, WA 98504-5204
From: "Nick ." on 10 May 2006 15:56 Dan, What is the objective of this macro? It will run 20 times, it will give you statistics on 20 differnt data sets? What is the objective of this macro for those of us who don't understand what Jonas is trying to implement and how to interprete? NICK ----- Original Message ----- From: "Nordlund, Dan (DSHS)" To: SAS-L(a)LISTSERV.UGA.EDU Subject: Re: jackknife concept Date: Wed, 10 May 2006 12:25:43 -0700 > -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of Jonas > Bilenas > Sent: Wednesday, May 10, 2006 5:16 AM > To: SAS-L(a)LISTSERV.UGA.EDU > Subject: Re: jackknife concept > > On Tue, 9 May 2006 17:49:53 -0400, Luo, Peter wrote: > > >David, for what Jonas was trying to do, i.e. to get some 'error' > estimates > >for model predictors, is N sub-samples or N bootstrapping samples the > better > >method? > > > I modified the code a bit, based on suggestions from David. Similar but > different results: > > %macro boot(iter); > > proc surveyselect data=reg out=outdata > rep=&ITER method=urs samprate=1 outhits; > run; > > %do i=1 %to &iter; > ods listing close; > ods output ParameterEstimates=bout; > proc logistic data=outdata; > where replicate=&i; > model bad=&ivs; > run; > ods output close; > > proc transpose data=bout out=bt&i; > var estimate; > id variable; > run; > %if "&i" ne "1" %then %do; > proc append base=bt1 data=bt&i; > run; > %end; > %end; > ods listing; > > > proc means data=bt1 mean min max std n nmiss; > run; > %mend; > > %boot(20); Jonas, you haven't incorporated one of the most important suggestions that David made, which is to use BY processing in Proc Logistic. That will eliminate having to continually open and close the file of bootstrap samples, and the file will only have to be read through once. Remove the %DO loop and replace the where statement with a BY statement. You can also eliminate the Proc Transpose and the Proc Append. Something like the following (I'm not sure where the macro variable &ivs is defined) : %macro boot(iter); proc surveyselect data=reg out=outdata rep=&ITER method=urs samprate=1 outhits; run; ods listing close; ods output ParameterEstimates=bout; proc logistic data=outdata; by replicate; model bad=&ivs; run; ods output close; ods listing; proc means data=bout mean min max std n nmiss; class variable; var estimate; output out=estimate_summary; run; %mend; %boot(20); Hope this is helpful, Dan Daniel J. Nordlund Research and Data Analysis Washington State Department of Social and Health Services Olympia, WA 98504-5204 -- ___________________________________________________ Play 100s of games for FREE! http://games.mail.com/
From: NordlDJ on 10 May 2006 17:16 > -----Original Message----- > From: Nick . [mailto:ni14(a)mail.com] > Sent: Wednesday, May 10, 2006 12:56 PM > To: Nordlund, Dan (DSHS); SAS-L(a)LISTSERV.UGA.EDU > Subject: Re: Re: jackknife concept > > Dan, > > What is the objective of this macro? It will run 20 times, it will give > you statistics on 20 differnt data sets? What is the objective of this > macro for those of us who don't understand what Jonas is trying to > implement and how to interprete? > NICK > > ----- Original Message ----- > From: "Nordlund, Dan (DSHS)" > To: SAS-L(a)LISTSERV.UGA.EDU > Subject: Re: jackknife concept > Date: Wed, 10 May 2006 12:25:43 -0700 > > > > -----Original Message----- > > From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of > Jonas > > Bilenas > > Sent: Wednesday, May 10, 2006 5:16 AM > > To: SAS-L(a)LISTSERV.UGA.EDU > > Subject: Re: jackknife concept > > > > On Tue, 9 May 2006 17:49:53 -0400, Luo, Peter wrote: > > > > >David, for what Jonas was trying to do, i.e. to get some 'error' > > estimates > > >for model predictors, is N sub-samples or N bootstrapping samples the > > better > > >method? > > > > > I modified the code a bit, based on suggestions from David. Similar but > > different results: <<<snip>>> > Jonas, > > you haven't incorporated one of the most important suggestions that David > made, which is to use BY processing in Proc Logistic. That will eliminate > having to continually open and close the file of bootstrap samples, and > the > file will only have to be read through once. Remove the %DO loop and > replace the where statement with a BY statement. You can also eliminate > the > Proc Transpose and the Proc Append. Something like the following (I'm not > sure where the macro variable &ivs is defined) : > > %macro boot(iter); > proc surveyselect data=reg out=outdata > rep=&ITER method=urs samprate=1 outhits; > run; > > ods listing close; > ods output ParameterEstimates=bout; > > proc logistic data=outdata; > by replicate; > model bad=&ivs; > run; > > ods output close; > ods listing; > > > proc means data=bout mean min max std n nmiss; > class variable; > var estimate; > output out=estimate_summary; > run; > %mend; > > %boot(20); > > Hope this is helpful, > > Dan > Nick, I haven't got the time, space, or probably even the skill to adequately explain bootstrapping, but I will try to briefly respond. I am sure that David will be only too kind to correct me if I go to far astray. :-) First the fact that I used a macro here was simply because I was responding to what had been written. Unless I was going to try to create a much more general boot macro with many parameters for flexibility (and I wouldn't because it's already been done) I would just write the basic code here with the number of replications hard coded. I am oversimplifying here, but bootstrapping is based on the assumption that your original sample is representative of the population from which it is drawn. So sampling with replacement from your original sample will produce a sample similar to what you could have gotten if you took a new sample from the parent population. Now take many bootstrap samples and compute a desired statistic, say the mean, on each one. Then you can empirically estimate what the sampling distribution of the statistic is, rather than assuming that the distribution is normal or some other distribution and estimating the standard errors using standard formulas. Bootstrapping can also be useful in those situations where you don't have an analytical solution for the standard error of your statistic. Here is a toy example logistic regression that you could play with. **create sample data; data test; do i=1 to 100; y=i GT 50; x0=i+20*normal(1234); x1=uniform(4321) > .5; x2=normal(1234); x3=normal(1234); output; end; run; /**run your initial logistic regression. It might be instructive to compare the mean of the bootstrap sample estimated coefficients (below) to the estimates here **/ proc logistic data=test; model y=x0 x1 x2 x3; run; /**create 20 bootstrap samples; in real life you would probably want many more; **/ proc surveyselect data=test out=outdata rep=20 method=urs samprate=1 outhits; run; ods listing close; ods output ParameterEstimates=bout; /**run your analysis using BY processing; the ODS output statement will collect 20 sets of Regression coefficients into one dataset, bout; **/ proc logistic data=outdata; by replicate; model y=x0 x1 x2 x3; run; ods output close; ods listing; /**compute the mean and Std.Dev. of the 20 regression coefficients For each variable. The std.dev. is *an* estimate of the standard error of estimate for the original regression coefficients. You might then use these standard errors (or an empirically estimated confidence interval) to assess whether your estimated coefficients are different from zero **/ proc means data=bout nway mean min max std n nmiss; class variable; var estimate; output out=estimate_summary; run; I hope this description has not been too far off the mark. Do not go out and try to bootstrap your own estimates using this partial, simplified explanation. I haven't dealt with a whole host of issues, including but not limited to things like bias estimation and whether you should be resampling cases or residuals. I hope this has been helpful for following this discussion thread, Dan Daniel J. Nordlund Research and Data Analysis Washington State Department of Social and Health Services Olympia, WA 98504-5204
From: Jonas Bilenas on 11 May 2006 07:47
On Wed, 10 May 2006 12:25:43 -0700, Nordlund, Dan (DSHS) <NordlDJ(a)DSHS.WA.GOV> wrote: >> -----Original Message----- >> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of Jonas >> Bilenas >> Sent: Wednesday, May 10, 2006 5:16 AM >> To: SAS-L(a)LISTSERV.UGA.EDU >> Subject: Re: jackknife concept >> >> On Tue, 9 May 2006 17:49:53 -0400, Luo, Peter <pluo(a)DRAFTNET.COM> wrote: >> >> >David, for what Jonas was trying to do, i.e. to get some 'error' >> estimates >> >for model predictors, is N sub-samples or N bootstrapping samples the >> better >> >method? >> > >> I modified the code a bit, based on suggestions from David. Similar but >> different results: >> >> %macro boot(iter); >> >> proc surveyselect data=reg out=outdata >> rep=&ITER method=urs samprate=1 outhits; >> run; >> >> %do i=1 %to &iter; >> ods listing close; >> ods output ParameterEstimates=bout; >> proc logistic data=outdata; >> where replicate=&i; >> model bad=&ivs; >> run; >> ods output close; >> >> proc transpose data=bout out=bt&i; >> var estimate; >> id variable; >> run; >> %if "&i" ne "1" %then %do; >> proc append base=bt1 data=bt&i; >> run; >> %end; >> %end; >> ods listing; >> >> >> proc means data=bt1 mean min max std n nmiss; >> run; >> %mend; >> >> %boot(20); > >Jonas, > >you haven't incorporated one of the most important suggestions that David >made, which is to use BY processing in Proc Logistic. That will eliminate >having to continually open and close the file of bootstrap samples, and the >file will only have to be read through once. Remove the %DO loop and >replace the where statement with a BY statement. You can also eliminate the >Proc Transpose and the Proc Append. Something like the following (I'm not >sure where the macro variable &ivs is defined) : > >%macro boot(iter); > proc surveyselect data=reg out=outdata > rep=&ITER method=urs samprate=1 outhits; > run; > > ods listing close; > ods output ParameterEstimates=bout; > > proc logistic data=outdata; > by replicate; > model bad=&ivs; > run; > > ods output close; > ods listing; > > > proc means data=bout mean min max std n nmiss; > class variable; > var estimate; > output out=estimate_summary; > run; >%mend; > >%boot(20); > >Hope this is helpful, > >Dan > >Daniel J. Nordlund >Research and Data Analysis >Washington State Department of Social and Health Services >Olympia, WA 98504-5204 Thanks. The first time I tried David's code I didn't get statistics on variabel coefficients. This was helpful. Jonas V. Bilenas JP Morgan CHASE Bank Decision Science |