jackknife concept [SAS]

Prev: calculate variance
Next: ODS Tagsets ExcelXP

From: Aparna on 9 May 2006 08:08

hi. can anybody pls explain me the concept of Jackknife and Bootstrap?
are they concerned with Regression alone? can this be used in
prediction interval? thanks

From: Peter Flom on 9 May 2006 08:46

>>> Aparna <aparnasprasad(a)GMAIL.COM> 5/9/2006 8:08 am >>>
<<<
hi. can anybody pls explain me the concept of Jackknife and Bootstrap?
are they concerned with Regression alone? can this be used in
prediction interval? thanks
>>>

This is a BIG topic, with a huge literature. Here is a VERY brief intro.
Both the bootstrap and the jackknife and resampling methods. They have meany uses, but a lot of these center around finding variance estimates for statistics where there is no formula, or where the assumptions are violated. So, to answer your question: No, they are not limited to regression, and yes, they can be used to help with predictions (I am not sure what you mean by 'prediction interval', but if you mean 'predict some value and estimate a confidence interval', then yes, they can do that.

I am under the impression that the bootstrap is now much more used than the jackknife, and also that the jackknife was a sort of 'poor mans bootstrap' that was used more when computers weren't so blazingly fast. The essential idea behind the bootstrap is more or less as follows:

Take a sample
Now, resample from that sample, with replacement. Do this many times.
Use these resamples to estimate parameters and variances.

For a fairly readable, if somewhat dated, introduction, read

Efron and Tibshirani An Introduction to the bootstrap.

for a more recent, but much more technically demanding review, there is

Davison and Hinkley : Bootstrap methods and their applications

There are tons and tons of articles as well, and many other books. But E and T is the seminal book.

HTH

Peter

Peter L. Flom, PhD
Assistant Director, Statistics and Data Analysis Core
Center for Drug Use and HIV Research
National Development and Research Institutes
71 W. 23rd St
http://cduhr.ndri.org
www.peterflom.com
New York, NY 10010
(212) 845-4485 (voice)
(917) 438-0894 (fax)

From: Jonas Bilenas on 9 May 2006 10:41

On Tue, 9 May 2006 05:08:49 -0700, Aparna <aparnasprasad(a)GMAIL.COM> wrote:

>hi. can anybody pls explain me the concept of Jackknife and Bootstrap?
>are they concerned with Regression alone? can this be used in
>prediction interval? thanks

I typically will use the bootstrap approach as opposed to hold out samples
to validate my models. One rule is that if the coefficeints change sign,
then that variable should be dropped. Here is an example using logistic
regression. Variable selection (not using stepwise) was buit on entire
sample. This will be featured in my new book I am working on for SAS
Press, SAS Applications in Credit Industry.

%macro bootstrap(mod_data,iter,);
ods listing close;

%do i = 1 %to &iter;
ods output clear;
ods output ParameterEstimates=b&i;
proc logistic data=&mod_data;
model bad=&ivs_trim;
where ranuni(0)<=.9;
run;quit;
ods output close;
run;
proc transpose data=b&i out=bt&i;
var estimate;
id variable;
run;
%if "&i" ne "1" %then %do;
proc append base=bt1 data=bt&i;
run;
%end;
%end;

ods listing;
proc means data=bt1 mean min max std n nmiss;
run;
%mend;
%bootstrap(reg1,20);

Here is truncated OUTPUT:
The MEANS Procedure

Variable Mean Minimum Maximum
Intercept 0.9560223 0.6456173 1.3784958
tof24 0.6999331 0.5134410 0.8170087
cd_util -0.4577382 -0.7089199 -0.2893133
nhistd3 -0.2086835 -0.3138207 -0.0920624
nocd 0.7812227 0.5508036 1.1057233
nodel 0.4298646 0.3049216 0.5502467
nonpromoinq -0.0532590 -0.0666753 -0.0292599
ntrades1 0.0432712 0.0239419 0.0573544
ntrades2 -0.1167981 -0.1399701 -0.0960097
ntrades2_2 0.0024367 0.0016911 0.0031254
average_hc_cd_p22 0.1913840 0.1463953 0.2747190

From: David L Cassell on 9 May 2006 14:20

aparnasprasad(a)GMAIL.COM wrote:
>hi. can anybody pls explain me the concept of Jackknife and Bootstrap?
>are they concerned with Regression alone? can this be used in
>prediction interval? thanks

[1] As Peter already said, the basic idea of both of these, as well
as many similar resampling methodologies, is to use your sample of
data to estimate some functional of the data, by sampling from
your sample, over and over and over, to get a linearization of the
the functional, in the same way that you got a linearization back in
calculus when you used Taylor series to approximate a complex
function.

[2] No, they are not concerned with regression only. And they
make fundamental implicit assumptions that no one bothers to
warn you about, so they are not applicable everywhere. Don't
use the naive bootstrap or jackknife on time series data, or
sampel survey date, or ...

[3] Can they be used in prediction intervals? Yes. Should they?
No. Why not? Because in simple linear regression, you already have
a nice, linear estimation form for your prediction intervals.
Bootstrapping an already-linearized estimate is about as useful as
changing a tire that isn't flat.

HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330

_________________________________________________________________
Is your PC infected? Get a FREE online computer virus scan from McAfee?
Security. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963

From: Peter Flom on 9 May 2006 14:32

>>> David L Cassell <davidlcassell(a)MSN.COM> 5/9/2006 2:20 pm >>> wrote
<<<
[2] No, they are not concerned with regression only. And they
make fundamental implicit assumptions that no one bothers to
warn you about, so they are not applicable everywhere. Don't
use the naive bootstrap or jackknife on time series data, or
sampel survey date, or ...
>>>

and also not to estimate maxima or minima, or extreme quantiles.
And be careful with things like factor analysis, where a big problem is
that the signs of the factors are arbitrary, and averaging can lead to
odd results (THAT error cost me a lot of hours to find and correct).

<<<
[3] Can they be used in prediction intervals? Yes. Should they?
No. Why not? Because in simple linear regression, you already have
a nice, linear estimation form for your prediction intervals.
Bootstrapping an already-linearized estimate is about as useful as
changing a tire that isn't flat.
>>

OK, as I know David knows, this is fine if the assumptions of the model
are met. But I was under the impression that bootstrapping can deal
with some fairly serious violations of those assumptions. Am I
incorrect?

Peter

| Next | Last
Pages: 1 2 3 4
Prev: calculate variance
Next: ODS Tagsets ExcelXP