From: Lihang Nong on
I have a set of data that is truncated below some value.I want to be able to fit a PDF to the data so that it can estimate the density of the truncated portion. I know that the tail of the function must end at (0,0). Is this possible with ksdensity? Maybe with a parametric function? To illustrate this better, please refer to this:
http://i.imgur.com/uYeQo.jpg
In this graph, the white bars are truncated, and I want to be able to come up with something like the purple curve.
My ultimate goal is to be able to calculate the total number of samples using the estimated density function. If there is another way to do this, it would be ok as well.
Thanks!
From: Peter Perkins on
On 7/27/2010 5:04 PM, Lihang Nong wrote:
> I have a set of data that is truncated below some value.I want to be
> able to fit a PDF to the data so that it can estimate the density of the
> truncated portion. I know that the tail of the function must end at
> (0,0). Is this possible with ksdensity?

>> help ksdensity
KSDENSITY Compute kernel density or distribution estimate
[snip]
'support' Either 'unbounded' (default) if the density can extend
over the whole real line, or 'positive' to restrict it to
positive values, or a two-element vector giving finite
lower and upper limits for the support of the density.
From: L N on
Already tried, doesn't seem to work. It just ensures that the distribution lies within the bound, but doesn't actually end at the bounded points on the real line. Perhaps I should try estimating with a parametric function?

Peter Perkins <Peter.Perkins(a)MathRemoveThisWorks.com> wrote in message <i2pfv1$bo0$1(a)fred.mathworks.com>...
> On 7/27/2010 5:04 PM, Lihang Nong wrote:
> > I have a set of data that is truncated below some value.I want to be
> > able to fit a PDF to the data so that it can estimate the density of the
> > truncated portion. I know that the tail of the function must end at
> > (0,0). Is this possible with ksdensity?
>
> >> help ksdensity
> KSDENSITY Compute kernel density or distribution estimate
> [snip]
> 'support' Either 'unbounded' (default) if the density can extend
> over the whole real line, or 'positive' to restrict it to
> positive values, or a two-element vector giving finite
> lower and upper limits for the support of the density.
From: Peter Perkins on
Lihang, sorry, I didn't read your original post clearly enough. If you
know how many observations fall below (about) 3.3, then you have what
are known as left-censoring. If you don't know how many, then you have
truncation. I'll assume you mean truncation.

You can use KSDENSITY with bounds on the support to estimate the density
above your truncation point. You could even use KSDENSITY with no
bounds to estimate of the portion of the distribution below the
truncation point, but that estimate would be based on almost no real
information, and as you say, has no constraint that it extend to zero.
A non-parametric estimate isn't going to give you what you want, because
there are no data in the region you care about.

I think most people would probably fit a truncated parametric
distribution. Looking at your plot, perhaps the extreme value or GEV
(or a mirror image of one of those), but I leave that up to you.
There's an example of fitting truncated distributions here:

<http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/customdist1demo.html#11>

Hope this helps.
From: L N on
Thanks Peter. One more question: I wanted to use a non-parametric distribution because it doesn't assume any models - is it generally considered ok to use a parametric function based only on closeness of fit? What if the process doesn't correspond to the distribution model at all?

Peter Perkins <Peter.Perkins(a)MathRemoveThisWorks.com> wrote in message <i2ru9t$k6u$1(a)fred.mathworks.com>...
> Lihang, sorry, I didn't read your original post clearly enough. If you
> know how many observations fall below (about) 3.3, then you have what
> are known as left-censoring. If you don't know how many, then you have
> truncation. I'll assume you mean truncation.
>
> You can use KSDENSITY with bounds on the support to estimate the density
> above your truncation point. You could even use KSDENSITY with no
> bounds to estimate of the portion of the distribution below the
> truncation point, but that estimate would be based on almost no real
> information, and as you say, has no constraint that it extend to zero.
> A non-parametric estimate isn't going to give you what you want, because
> there are no data in the region you care about.
>
> I think most people would probably fit a truncated parametric
> distribution. Looking at your plot, perhaps the extreme value or GEV
> (or a mirror image of one of those), but I leave that up to you.
> There's an example of fitting truncated distributions here:
>
> <http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/customdist1demo.html#11>
>
> Hope this helps.