From: naimead on 5 May 2010 07:09 Hello all, Is it possible to calculate 95-percentile of a non-Gaussian distribution without having to define its exact distribution? thank you very much, naimead
From: ImageAnalyst on 5 May 2010 11:13 Of course. Just use hist() to get the actual distribution, and then cumsum() on the histogram to figure out when you've passed the 95% point. No assumption of the actual mathematical form of the histogram distribution is necessary.
From: Peter Perkins on 5 May 2010 11:14 On 5/5/2010 11:09 AM, naimead wrote: > Is it possible to calculate 95-percentile of a non-Gaussian distribution without having to define its exact distribution? If you have access to the Statistics Toolbox, >> help prctile PRCTILE Percentiles of a sample. >> help quantile QUANTILE Quantiles of a sample.
From: Walter Roberson on 5 May 2010 11:49 naimead wrote: > Is it possible to calculate 95-percentile of a non-Gaussian distribution without having to define its exact distribution? Maybe. In theory, the description of a distribution could be piecewise and exactly describe at least the last 5 percent; in such a case you could calculate the 95-percentile without knowing the distribution of the rest. In situations where the distribution is defined by a set of data rather than by a formula, as you start at the beginning of the data and proceed through it, you can narrow down the range of where the 95-percentile would be. Can you locate the 95-percentile without examining all of the data? _Sometimes_ you can: if individual data values can re-occur and you are tracking and find that the bounds of the 95-percentile fall entirely within a block of repeated data, then you can terminate early. As a simple thought experiment on those lines, if your data set had 101 samples and you had examined 96 of them so far and had found they were all the same, then you do not need to examine the remaining data samples, as no matter what they are they would not be able to push the 95-percentile away from that repeated value. If, though, you had only examined 95 of the 101 data samples so far, you cannot know whether the values of those unexamined samples might happen to all be above the repeated value: if they do happen to be, then the 95-percentile would be one of them, whereas if even one of them was less than or equal to the known repeated value, then the 95-percentile would be the repeated value in this hypothetical situation. In situations where the distribution is defined by an unknown formula... I'm not sure... possibly if you had enough information _about_ the formula. In situations where the distribution is defined by a known formula that has one or more parameters whose values are not currently known: _Sometimes_ you can. The known formula might be manipulable to calculate the mean and standard deviation in terms of a relationship between the unknown parameters, and you might know what the value of that relationship is without knowing the parameters themselves. For example, the mean and standard deviation might come down to the ratio of two unknown parameters, and you might know their ratio without knowing their exact values at the time of the calculation.
From: Walter Roberson on 5 May 2010 12:01
ImageAnalyst wrote: > Of course. Just use hist() to get the actual distribution, and then > cumsum() on the histogram to figure out when you've passed the 95% > point. No assumption of the actual mathematical form of the histogram > distribution is necessary. The problem statement did not indicate that there are a set of samples that _define_ the distribution: it was a general enough question to apply to distributions defined by a formula whose parameters are not completely known. If there are a set of samples, then one needs to know if the samples define the distribution or have instead have been sampled from the distribution. If the data has been sampled from a distribution, then you cannot use a histogram of the samples in order to find the 95-percentile, other than perhaps probabilistically. "This poll is accurate to within 3%, 19 times out of 20". |