BinCounts to InterpolatingFunction [Mathematica]

Prev: Directing formatted output to different notebooks
Next: 3D visulaisation of 3D matrix for a 3D CA

From: Kevin J. McCann on 27 Apr 2010 04:06

I am using a Markov Chain Monte Carlo (MCMC) approach to evaluate a
multidimensional probability density function. The output is a large
number of multidimensional points {x1,x2,...,xn}. I can use BinCounts to
gather the points into a PDF (after appropriate normalization). I would
like to then define a function, p[X_], which is the multidimensional
interpolation of the BinCounts output, but I can't figure out how to
automate this for an arbitrary number of dimensions.

Any ideas?

For the 2d case I did the following:

tbl = Partition[
Flatten[Table[{xmin + i*\[CapitalDelta]x + \[CapitalDelta]x/2,
ymin + j*\[CapitalDelta]y + \[CapitalDelta]y/2,
counts[[i + 1,
j + 1]]/(\[ScriptCapitalN] \[CapitalDelta]x \
\[CapitalDelta]y)}, {i, 0, nx - 1}, {j, 0, ny - 1}]], 3];

f=Interpolation[tbl]

But as you can see, this is not easily extended to higher dimensions.

Kevin

From: dh on 27 Apr 2010 08:48

On 27.04.2010 10:06, Kevin J. McCann wrote:
> I am using a Markov Chain Monte Carlo (MCMC) approach to evaluate a
> multidimensional probability density function. The output is a large
> number of multidimensional points {x1,x2,...,xn}. I can use BinCounts to
> gather the points into a PDF (after appropriate normalization). I would
> like to then define a function, p[X_], which is the multidimensional
> interpolation of the BinCounts output, but I can't figure out how to
> automate this for an arbitrary number of dimensions.
>
> Any ideas?
>
> For the 2d case I did the following:
>
> tbl = Partition[
> Flatten[Table[{xmin + i*\[CapitalDelta]x + \[CapitalDelta]x/2,
> ymin + j*\[CapitalDelta]y + \[CapitalDelta]y/2,
> counts[[i + 1,
> j + 1]]/(\[ScriptCapitalN] \[CapitalDelta]x \
> \[CapitalDelta]y)}, {i, 0, nx - 1}, {j, 0, ny - 1}]], 3];
>
> f=Interpolation[tbl]
>
> But as you can see, this is not easily extended to higher dimensions.
>
> Kevin
>
Hi Kevin,
if I understand correctly, your problem is the generation of a suitable
grid of data points for "Interpolation".
Assume you have a function bins[{i1,i2,..,in}] of n integer arguments.
The arguments run from 0..ni. The vector of ni is called
bounds={n1,n2..nn}. We can now define the function "dataGrid" that
creates a rectangular multidimensional structure for the input to
Interpolation:

dataGrid[bins_, bounds_] := Module[{iter},
iter = {x, 0, n - 1} /.
Table[{x -> Symbol["x" <> ToString[i]], n -> bounds[[i]]}, {i, 1,
Length[bounds]}];
Flatten[
Table[{iter[[All, 1 ]], bins[iter[[All, 1 ]]]},
Evaluate[Sequence @@ iter]]
, Length[bounds] - 1]
]

If we choose an example for bins:
bins[v : {_ ..}] := Times @@ v;
we can calulation an interpolation:

bins[v : {_ ..}] := Times @@ v;
Interpolation(a)dataGrid[bins, {4, 4, 4}]

cheers, Daniel

--

Daniel Huber
Metrohm Ltd.
Oberdorfstr. 68
CH-9100 Herisau
Tel. +41 71 353 8585, Fax +41 71 353 8907
E-Mail:<mailto:dh(a)metrohm.com>
Internet:<http://www.metrohm.com>

From: Kurt TeKolste on 29 Apr 2010 02:52

If I understand this algorithm: it would seem that it will feed all of
the counts for all of the bins into Interpolation. If this is correct,
read on.

One of the problems in dealing with multidimensional data is that it
takes quite large samples to fill in the huge multidimensional volume.
In other words, it is hard to get bins fine enough in all dimensions and
without having almost all of your bin counts be zero.

I suspect that the interpolation will not be very satisfying unless your
sample size is huge or you only need relatively course bins. Note the
dividing each of four dimensions into 20 bins is already 160,000 bins
with an average probability that a randomly chosen sample will be in any
particular bin of 1/160000 = 4x10^-6. It takes a long time for the
montecarlo to look like a real distribution ...

I am not an expert in this area, but I would be tempted to use only the
bins with non-zero values. I recall reading about some techniques for
dealing with this -- something about trying to sample where the density
is highest -- but do not recall the reference. Also, if you start with
an a priori distribution rather than trying to construct the
distribution based solely on data you have more tools available.

ekt

On Tue, 27 Apr 2010 08:48 -0400, "dh" <dh(a)metrohm.com> wrote:
> On 27.04.2010 10:06, Kevin J. McCann wrote:
> > I am using a Markov Chain Monte Carlo (MCMC) approach to evaluate a
> > multidimensional probability density function. The output is a large
> > number of multidimensional points {x1,x2,...,xn}. I can use BinCounts to
> > gather the points into a PDF (after appropriate normalization). I would
> > like to then define a function, p[X_], which is the multidimensional
> > interpolation of the BinCounts output, but I can't figure out how to
> > automate this for an arbitrary number of dimensions.
> >
> > Any ideas?
> >
> > For the 2d case I did the following:
> >
> > tbl = Partition[
> > Flatten[Table[{xmin + i*\[CapitalDelta]x + \[CapitalDelta]x/2,
> > ymin + j*\[CapitalDelta]y + \[CapitalDelta]y/2,
> > counts[[i + 1,
> > j + 1]]/(\[ScriptCapitalN] \[CapitalDelta]x \
> > \[CapitalDelta]y)}, {i, 0, nx - 1}, {j, 0, ny - 1}]], 3];
> >
> > f=Interpolation[tbl]
> >
> > But as you can see, this is not easily extended to higher dimensions.
> >
> > Kevin
> >
> Hi Kevin,
> if I understand correctly, your problem is the generation of a suitable
> grid of data points for "Interpolation".
> Assume you have a function bins[{i1,i2,..,in}] of n integer arguments.
> The arguments run from 0..ni. The vector of ni is called
> bounds={n1,n2..nn}. We can now define the function "dataGrid" that
> creates a rectangular multidimensional structure for the input to
> Interpolation:
>
> dataGrid[bins_, bounds_] := Module[{iter},
> iter = {x, 0, n - 1} /.
> Table[{x -> Symbol["x" <> ToString[i]], n -> bounds[[i]]}, {i, 1,
> Length[bounds]}];
> Flatten[
> Table[{iter[[All, 1 ]], bins[iter[[All, 1 ]]]},
> Evaluate[Sequence @@ iter]]
> , Length[bounds] - 1]
> ]
>
> If we choose an example for bins:
> bins[v : {_ ..}] := Times @@ v;
> we can calulation an interpolation:
>
> bins[v : {_ ..}] := Times @@ v;
> Interpolation(a)dataGrid[bins, {4, 4, 4}]
>
> cheers, Daniel
>
> --
>
> Daniel Huber
> Metrohm Ltd.
> Oberdorfstr. 68
> CH-9100 Herisau
> Tel. +41 71 353 8585, Fax +41 71 353 8907
> E-Mail:<mailto:dh(a)metrohm.com>
> Internet:<http://www.metrohm.com>
>
>
>

From: DrMajorBob on 30 Apr 2010 05:49

If Kevin wants to approximate a PDF, perhaps he should start with the
sample CDF, interpolate and smooth it, then differentiate.

Bobby

On Thu, 29 Apr 2010 01:53:38 -0500, Kurt TeKolste <tekolste(a)fastmail.net>
wrote:

> If I understand this algorithm: it would seem that it will feed all of
> the counts for all of the bins into Interpolation. If this is correct,
> read on.
>
> One of the problems in dealing with multidimensional data is that it
> takes quite large samples to fill in the huge multidimensional volume.
> In other words, it is hard to get bins fine enough in all dimensions and
> without having almost all of your bin counts be zero.
>
> I suspect that the interpolation will not be very satisfying unless your
> sample size is huge or you only need relatively course bins. Note the
> dividing each of four dimensions into 20 bins is already 160,000 bins
> with an average probability that a randomly chosen sample will be in any
> particular bin of 1/160000 = 4x10^-6. It takes a long time for the
> montecarlo to look like a real distribution ...
>
> I am not an expert in this area, but I would be tempted to use only the
> bins with non-zero values. I recall reading about some techniques for
> dealing with this -- something about trying to sample where the density
> is highest -- but do not recall the reference. Also, if you start with
> an a priori distribution rather than trying to construct the
> distribution based solely on data you have more tools available.
>
> ekt
>
> On Tue, 27 Apr 2010 08:48 -0400, "dh" <dh(a)metrohm.com> wrote:
>> On 27.04.2010 10:06, Kevin J. McCann wrote:
>> > I am using a Markov Chain Monte Carlo (MCMC) approach to evaluate a
>> > multidimensional probability density function. The output is a large
>> > number of multidimensional points {x1,x2,...,xn}. I can use BinCounts
>> to
>> > gather the points into a PDF (after appropriate normalization). I
>> would
>> > like to then define a function, p[X_], which is the multidimensional
>> > interpolation of the BinCounts output, but I can't figure out how to
>> > automate this for an arbitrary number of dimensions.
>> >
>> > Any ideas?
>> >
>> > For the 2d case I did the following:
>> >
>> > tbl = Partition[
>> > Flatten[Table[{xmin + i*\[CapitalDelta]x + \[CapitalDelta]x/2,
>> > ymin + j*\[CapitalDelta]y + \[CapitalDelta]y/2,
>> > counts[[i + 1,
>> > j + 1]]/(\[ScriptCapitalN] \[CapitalDelta]x \
>> > \[CapitalDelta]y)}, {i, 0, nx - 1}, {j, 0, ny - 1}]], 3];
>> >
>> > f=Interpolation[tbl]
>> >
>> > But as you can see, this is not easily extended to higher dimensions.
>> >
>> > Kevin
>> >
>> Hi Kevin,
>> if I understand correctly, your problem is the generation of a suitable
>> grid of data points for "Interpolation".
>> Assume you have a function bins[{i1,i2,..,in}] of n integer arguments.
>> The arguments run from 0..ni. The vector of ni is called
>> bounds={n1,n2..nn}. We can now define the function "dataGrid" that
>> creates a rectangular multidimensional structure for the input to
>> Interpolation:
>>
>> dataGrid[bins_, bounds_] := Module[{iter},
>> iter = {x, 0, n - 1} /.
>> Table[{x -> Symbol["x" <> ToString[i]], n -> bounds[[i]]}, {i, 1,
>> Length[bounds]}];
>> Flatten[
>> Table[{iter[[All, 1 ]], bins[iter[[All, 1 ]]]},
>> Evaluate[Sequence @@ iter]]
>> , Length[bounds] - 1]
>> ]
>>
>> If we choose an example for bins:
>> bins[v : {_ ..}] := Times @@ v;
>> we can calulation an interpolation:
>>
>> bins[v : {_ ..}] := Times @@ v;
>> Interpolation(a)dataGrid[bins, {4, 4, 4}]
>>
>> cheers, Daniel
>>
>> --
>>
>> Daniel Huber
>> Metrohm Ltd.
>> Oberdorfstr. 68
>> CH-9100 Herisau
>> Tel. +41 71 353 8585, Fax +41 71 353 8907
>> E-Mail:<mailto:dh(a)metrohm.com>
>> Internet:<http://www.metrohm.com>
>>
>>
>>
>

--
DrMajorBob(a)yahoo.com

|
Pages: 1
Prev: Directing formatted output to different notebooks
Next: 3D visulaisation of 3D matrix for a 3D CA