From: zapissa eckle on
Dear all,
I have a problem with the confidence limit given by gpfit.
We use a generalized Pareto distribution to fit data.
I take the confidence intervals from the gpfit function, and plot the compelementary cdf i.e. F=1-cdf together with the data with the confidence limits. Here is my question: The confidence intervals appear to be extremely narrow, how is this possible? Am I misinterpreting this? According to the Matlab documentation
http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/gparetodemo.html
the confidence limits should contain 95% of possible probability distributions generated from the same distribution as generated by a bootstrap procedure. I tried to reproduce this with artificial data drawn from a pareto distribution in the file (see below) but again it seems the confidence intervals given by gpfit are much narrower than expected from the bootstrap procedure..
Can anyone give me a hint? Am I misinterpreting the values given in parmci?
Thank you very much,
zapissa
%% Example code to plot confidence intervals on data fit..
clear all
sampl=100;
%% generate artifical dataset from pareto dist
y=random('gp',0.4,20,1,[sampl,1]);
%% x axis
xx=1:round(max(y));

%% fit the data with gpfit
[parmhat,parmci] = gpfit (y-1,0.95)
kHat = parmhat(1); % Tail index parameter
sigmaHat = parmhat(2); % Scale parameter
conf1=1-cdf('gp',xx,parmci (1,1),parmci (1,2),1);
conf2=1-cdf('gp',xx,parmci (2,1),parmci (2,2),1);

replEsts = bootstrp(sampl,@gpfit,y);
for a=1:sampl
boot(:,a)=(1-cdf('gp',xx,replEsts(a,1),replEsts(a,2),1)); % calcualte analytical pdf
end
figure(1)
loglog(xx,boot(:,1:100),'black')
hold on
loglog(xx,conf1,'red',xx,conf2,'red','LineWidth',2)
hold off
From: Peter Perkins on
On 7/15/2010 4:10 AM, zapissa eckle wrote:
> I have a problem with the confidence limit given by gpfit.
> We use a generalized Pareto distribution to fit data.
> I take the confidence intervals from the gpfit function, and plot the
> compelementary cdf i.e. F=1-cdf together with the data with the
> confidence limits. Here is my question: The confidence intervals appear
> to be extremely narrow, how is this possible? Am I misinterpreting this?
> According to the Matlab documentation
> http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/gparetodemo.html
> the confidence limits should contain 95% of possible probability
> distributions generated from the same distribution as generated by a
> bootstrap procedure.

It does not say that as far as I can see.

A confidence interval is random. The definition is that in repeated
samples, the random interval will contain the unknown parameter
(1-alpha)% of the time. So the appropriate simulation to run is

n = 100;
theta = 1;
k = .4;
sigma = 20;

nreps = 10000;
parmhat = zeros(nreps,2);
parmci = zeros(nreps,2,2);
for i = 1:nreps
y = gprnd(k,sigma,theta,[n,1]);
[parmhat(i,:),parmci(i,:,:)] = gpfit(y-theta,0.05);
end
kCIcoverage = sum( (parmci(:,1,1) <= k) & (k <= parmci(:,2,1)) ) / nreps
sigmaCIcoverage = sum( (parmci(:,1,2) <= sigma) & (sigma <=
parmci(:,2,2)) ) / nreps

When I run that, I get coverage probabilities

kCIcoverage =
0.9241
sigmaCIcoverage =
0.9473

which is slightly lower than the stated 95% coverage, indicating that
the assumptions required for those CIs perhaps need a sample size larger
than 100 for the particular parameter values you've chosen.

There are a few things that I see wrong with your simulation:

1) The second arg to gpfit ought to be what is traditionally called
alpha, i.e., the CIs will have coverage probability (1-alpha). You've
passed in .95 when you probably meant .05.
2) You're assuming that you can take two individual CIs for the
parameters and transform those into a CI for the CDF. That may work in
some sense (since the CDF is increasing in both parameters), but seems
suspect to me. I can't tell if you're thinking of that CI as pointwise
or simultaneous.
3) Bootstrapping, as opposed to creating repeated samples from fixed
parameters as I've done, is perhaps reasonable, but the thing you would
want to bootstrap is the CIs, not the parameter estimates, to verify
their coverage probabilities.

Hope this helps.
From: zapissa on
Thank you very much, that really helps. Unfortunately I have to admit that it was really the wrong definition of the 1-alpha that was the problem and got all this experimenting with bootstrapping and random datasets started. So now that I am at least not far off any more I can start looking at the details and I agree that random samples should be more valid than bootstrapping.
Thanks again,
zapissa