Prev: calculate variance
Next: ODS Tagsets ExcelXP
From: Aparna on 9 May 2006 08:08 hi. can anybody pls explain me the concept of Jackknife and Bootstrap? are they concerned with Regression alone? can this be used in prediction interval? thanks
From: Peter Flom on 9 May 2006 08:46 >>> Aparna <aparnasprasad(a)GMAIL.COM> 5/9/2006 8:08 am >>> <<< hi. can anybody pls explain me the concept of Jackknife and Bootstrap? are they concerned with Regression alone? can this be used in prediction interval? thanks >>> This is a BIG topic, with a huge literature. Here is a VERY brief intro. Both the bootstrap and the jackknife and resampling methods. They have meany uses, but a lot of these center around finding variance estimates for statistics where there is no formula, or where the assumptions are violated. So, to answer your question: No, they are not limited to regression, and yes, they can be used to help with predictions (I am not sure what you mean by 'prediction interval', but if you mean 'predict some value and estimate a confidence interval', then yes, they can do that. I am under the impression that the bootstrap is now much more used than the jackknife, and also that the jackknife was a sort of 'poor mans bootstrap' that was used more when computers weren't so blazingly fast. The essential idea behind the bootstrap is more or less as follows: Take a sample Now, resample from that sample, with replacement. Do this many times. Use these resamples to estimate parameters and variances. For a fairly readable, if somewhat dated, introduction, read Efron and Tibshirani An Introduction to the bootstrap. for a more recent, but much more technically demanding review, there is Davison and Hinkley : Bootstrap methods and their applications There are tons and tons of articles as well, and many other books. But E and T is the seminal book. HTH Peter Peter L. Flom, PhD Assistant Director, Statistics and Data Analysis Core Center for Drug Use and HIV Research National Development and Research Institutes 71 W. 23rd St http://cduhr.ndri.org www.peterflom.com New York, NY 10010 (212) 845-4485 (voice) (917) 438-0894 (fax)
From: Jonas Bilenas on 9 May 2006 10:41 On Tue, 9 May 2006 05:08:49 -0700, Aparna <aparnasprasad(a)GMAIL.COM> wrote: >hi. can anybody pls explain me the concept of Jackknife and Bootstrap? >are they concerned with Regression alone? can this be used in >prediction interval? thanks I typically will use the bootstrap approach as opposed to hold out samples to validate my models. One rule is that if the coefficeints change sign, then that variable should be dropped. Here is an example using logistic regression. Variable selection (not using stepwise) was buit on entire sample. This will be featured in my new book I am working on for SAS Press, SAS Applications in Credit Industry. %macro bootstrap(mod_data,iter,); ods listing close; %do i = 1 %to &iter; ods output clear; ods output ParameterEstimates=b&i; proc logistic data=&mod_data; model bad=&ivs_trim; where ranuni(0)<=.9; run;quit; ods output close; run; proc transpose data=b&i out=bt&i; var estimate; id variable; run; %if "&i" ne "1" %then %do; proc append base=bt1 data=bt&i; run; %end; %end; ods listing; proc means data=bt1 mean min max std n nmiss; run; %mend; %bootstrap(reg1,20); Here is truncated OUTPUT: The MEANS Procedure Variable Mean Minimum Maximum Intercept 0.9560223 0.6456173 1.3784958 tof24 0.6999331 0.5134410 0.8170087 cd_util -0.4577382 -0.7089199 -0.2893133 nhistd3 -0.2086835 -0.3138207 -0.0920624 nocd 0.7812227 0.5508036 1.1057233 nodel 0.4298646 0.3049216 0.5502467 nonpromoinq -0.0532590 -0.0666753 -0.0292599 ntrades1 0.0432712 0.0239419 0.0573544 ntrades2 -0.1167981 -0.1399701 -0.0960097 ntrades2_2 0.0024367 0.0016911 0.0031254 average_hc_cd_p22 0.1913840 0.1463953 0.2747190
From: David L Cassell on 9 May 2006 14:20 aparnasprasad(a)GMAIL.COM wrote: >hi. can anybody pls explain me the concept of Jackknife and Bootstrap? >are they concerned with Regression alone? can this be used in >prediction interval? thanks [1] As Peter already said, the basic idea of both of these, as well as many similar resampling methodologies, is to use your sample of data to estimate some functional of the data, by sampling from your sample, over and over and over, to get a linearization of the the functional, in the same way that you got a linearization back in calculus when you used Taylor series to approximate a complex function. [2] No, they are not concerned with regression only. And they make fundamental implicit assumptions that no one bothers to warn you about, so they are not applicable everywhere. Don't use the naive bootstrap or jackknife on time series data, or sampel survey date, or ... [3] Can they be used in prediction intervals? Yes. Should they? No. Why not? Because in simple linear regression, you already have a nice, linear estimation form for your prediction intervals. Bootstrapping an already-linearized estimate is about as useful as changing a tire that isn't flat. HTH, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330 _________________________________________________________________ Is your PC infected? Get a FREE online computer virus scan from McAfee? Security. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963
From: Peter Flom on 9 May 2006 14:32
>>> David L Cassell <davidlcassell(a)MSN.COM> 5/9/2006 2:20 pm >>> wrote <<< [2] No, they are not concerned with regression only. And they make fundamental implicit assumptions that no one bothers to warn you about, so they are not applicable everywhere. Don't use the naive bootstrap or jackknife on time series data, or sampel survey date, or ... >>> and also not to estimate maxima or minima, or extreme quantiles. And be careful with things like factor analysis, where a big problem is that the signs of the factors are arbitrary, and averaging can lead to odd results (THAT error cost me a lot of hours to find and correct). <<< [3] Can they be used in prediction intervals? Yes. Should they? No. Why not? Because in simple linear regression, you already have a nice, linear estimation form for your prediction intervals. Bootstrapping an already-linearized estimate is about as useful as changing a tire that isn't flat. >> OK, as I know David knows, this is fine if the assumptions of the model are met. But I was under the impression that bootstrapping can deal with some fairly serious violations of those assumptions. Am I incorrect? Peter |