From: zapissa eckle on 15 Jul 2010 04:10 Dear all, I have a problem with the confidence limit given by gpfit. We use a generalized Pareto distribution to fit data. I take the confidence intervals from the gpfit function, and plot the compelementary cdf i.e. F=1-cdf together with the data with the confidence limits. Here is my question: The confidence intervals appear to be extremely narrow, how is this possible? Am I misinterpreting this? According to the Matlab documentation http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/gparetodemo.html the confidence limits should contain 95% of possible probability distributions generated from the same distribution as generated by a bootstrap procedure. I tried to reproduce this with artificial data drawn from a pareto distribution in the file (see below) but again it seems the confidence intervals given by gpfit are much narrower than expected from the bootstrap procedure.. Can anyone give me a hint? Am I misinterpreting the values given in parmci? Thank you very much, zapissa %% Example code to plot confidence intervals on data fit.. clear all sampl=100; %% generate artifical dataset from pareto dist y=random('gp',0.4,20,1,[sampl,1]); %% x axis xx=1:round(max(y)); %% fit the data with gpfit [parmhat,parmci] = gpfit (y-1,0.95) kHat = parmhat(1); % Tail index parameter sigmaHat = parmhat(2); % Scale parameter conf1=1-cdf('gp',xx,parmci (1,1),parmci (1,2),1); conf2=1-cdf('gp',xx,parmci (2,1),parmci (2,2),1); replEsts = bootstrp(sampl,@gpfit,y); for a=1:sampl boot(:,a)=(1-cdf('gp',xx,replEsts(a,1),replEsts(a,2),1)); % calcualte analytical pdf end figure(1) loglog(xx,boot(:,1:100),'black') hold on loglog(xx,conf1,'red',xx,conf2,'red','LineWidth',2) hold off
From: Peter Perkins on 15 Jul 2010 11:05 On 7/15/2010 4:10 AM, zapissa eckle wrote: > I have a problem with the confidence limit given by gpfit. > We use a generalized Pareto distribution to fit data. > I take the confidence intervals from the gpfit function, and plot the > compelementary cdf i.e. F=1-cdf together with the data with the > confidence limits. Here is my question: The confidence intervals appear > to be extremely narrow, how is this possible? Am I misinterpreting this? > According to the Matlab documentation > http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/gparetodemo.html > the confidence limits should contain 95% of possible probability > distributions generated from the same distribution as generated by a > bootstrap procedure. It does not say that as far as I can see. A confidence interval is random. The definition is that in repeated samples, the random interval will contain the unknown parameter (1-alpha)% of the time. So the appropriate simulation to run is n = 100; theta = 1; k = .4; sigma = 20; nreps = 10000; parmhat = zeros(nreps,2); parmci = zeros(nreps,2,2); for i = 1:nreps y = gprnd(k,sigma,theta,[n,1]); [parmhat(i,:),parmci(i,:,:)] = gpfit(y-theta,0.05); end kCIcoverage = sum( (parmci(:,1,1) <= k) & (k <= parmci(:,2,1)) ) / nreps sigmaCIcoverage = sum( (parmci(:,1,2) <= sigma) & (sigma <= parmci(:,2,2)) ) / nreps When I run that, I get coverage probabilities kCIcoverage = 0.9241 sigmaCIcoverage = 0.9473 which is slightly lower than the stated 95% coverage, indicating that the assumptions required for those CIs perhaps need a sample size larger than 100 for the particular parameter values you've chosen. There are a few things that I see wrong with your simulation: 1) The second arg to gpfit ought to be what is traditionally called alpha, i.e., the CIs will have coverage probability (1-alpha). You've passed in .95 when you probably meant .05. 2) You're assuming that you can take two individual CIs for the parameters and transform those into a CI for the CDF. That may work in some sense (since the CDF is increasing in both parameters), but seems suspect to me. I can't tell if you're thinking of that CI as pointwise or simultaneous. 3) Bootstrapping, as opposed to creating repeated samples from fixed parameters as I've done, is perhaps reasonable, but the thing you would want to bootstrap is the CIs, not the parameter estimates, to verify their coverage probabilities. Hope this helps.
From: zapissa on 16 Jul 2010 02:57 Thank you very much, that really helps. Unfortunately I have to admit that it was really the wrong definition of the 1-alpha that was the problem and got all this experimenting with bootstrapping and random datasets started. So now that I am at least not far off any more I can start looking at the details and I agree that random samples should be more valid than bootstrapping. Thanks again, zapissa
|
Pages: 1 Prev: Setting colwidth for actuvex FlexArray Next: Setting colwidth for actuvex FlexArray |