how to generate random variable with constraint? [Matlab]

Prev: Speed bottleneck
Next: Error message for rayleighchan

From: Matt J on 20 Jul 2010 14:50

"John D'Errico" <woodchips(a)rochester.rr.com> wrote in message <i24qj2$pi6$1(a)fred.mathworks.com>...

>
> No, if you bothered to look at my test, it was done
> in 5 dimensions. Yes, if you solve a very different
> problem from that which was asked about, the time
> will be different.
========

Then we have no disagreement.

In case my earlier posts weren't clear, I'm no longer all that concerned with the specific case raised by the OP (no offense, Jay). I'm more interested in how we might do this more cheaply if we wanted to do it in higher dimensions (and why wouldn't we?).

Since the OP already has been given at least one solution that will work, I think it's not unfair to allow the thread to stray to related tangents...

From: Roger Stafford on 20 Jul 2010 15:06

"Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message <i24pio$k5f$1(a)fred.mathworks.com>...
> Roger- You're right. I had a mistake. What I really meant to give was this:
>
> A=cumsum(rand(1,6)); A=A/A(end); A(end)=[];
>
> I reran John's 2D test on this and find that it covers the correct triangular area, though slightly less uniformly than the sorting method.
>
> Again, though, for me, this was all just an exercise in seeing if we could get something nearly as good using cheaper summations instead of sorting.
- - - - - - - - - - -
Matt I checked out the two-dimensional plot for your revised code:

A=cumsum(rand(1,6)); A=A/A(end); A(end)=[];

It does cover the correct triangle. However it is grossly inaccurate to say that it is only "slightly less uniform than the sorting method." The probability area density actually drops down to zero at each corner of that triangle, whereas in a good solution it ought to be a uniform plateau throughout the entire triangle's area. This disparity would continue to worsen as the number of variables increases, though unfortunately it is difficult to illustrate this fact with plots.

Roger Stafford

From: Matt J on 20 Jul 2010 15:09

"Roger Stafford" <ellieandrogerxyzzy(a)mindspring.com.invalid> wrote in message <i24pu1$d60$1(a)fred.mathworks.com>...

> Matt, you shouldn't give people solutions that are distinctly inferior just because their code would run faster.
================

Roger, see my revision in Message #18. It seems to be a better contender.

In any case, yes, I would hate for Jay to walk away without knowing the limitations of the solutions we propose, but I'm still feeling my through it myself.

Even if I haven't figured out exactly how, it seems distinctly intuitive that we should be able to derive this with cumsum because the jumps between a<b<c<d,etc...
form a positive-valued Markov process
(like cumsum(rand(1,N)). So you would think it possible to derive the solution from this.

Should I be brainstorming out loud on the NG? Debatable, but I've seen lots of people here do it...

From: Matt J on 20 Jul 2010 15:40

"Roger Stafford" <ellieandrogerxyzzy(a)mindspring.com.invalid> wrote in message <i24s3b$58o$1(a)fred.mathworks.com>...

> It does cover the correct triangle. However it is grossly inaccurate to say that it is only "slightly less uniform than the sorting method."
==============

I was going by an eyeball assessment of John's plots. Those plots don't give a full picture of the distribution, but only salt-and-pepper sampling patterns (which were slightly more salty than peppery for the cumsum method).

The probability area density actually drops down to zero at each corner of that triangle, whereas in a good solution it ought to be a uniform plateau throughout the entire triangle's area. This disparity would continue to worsen as the number of variables increases, though unfortunately it is difficult to illustrate this fact with plots.
===========

So you're saying it's more like a Gaussian distribution over the triangle? That's strange. However, surely this is a much more reasonable contender than my earlier version, considering (a) that Jay never said whether he was interested in a uniform or a Gaussian distribution and (b) that it's more efficient to generate.

From: Roger Stafford on 20 Jul 2010 19:49

"Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message <i24u36$g1a$1(a)fred.mathworks.com>...
> ............
> So you're saying it's more like a Gaussian distribution over the triangle? That's strange. However, surely this is a much more reasonable contender than my earlier version, considering (a) that Jay never said whether he was interested in a uniform or a Gaussian distribution and (b) that it's more efficient to generate.
- - - - - - - - -
Yes I agree your revised version is a more reasonable contender than the previous one, Matt. I used your earlier code blindly but should have realized that you surely meant something else.

This later version is rather similar in a sense to the solutions that have been given often in this group for the problem of n random variables with a predetermined sum which I mentioned earlier, where n rand's are taken and then they are each divided by their sum times the desired sum value. Both techniques tend to concentrate values in the center regions at the expense of the outer regions - that is disproportionately to the n-dimensonal volumes of those regions. And yes for large n they begin to approach gaussian distributions (the central limit theorem at work again.)

For that reason such methods don't satisfy the principle I mentioned earlier of generating the variables in such a manner that they are equivalent, statistically speaking, to a process that generates the variables without constraints but then rejects all that don't satisfy the constraints.

Roger Stafford

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: Speed bottleneck
Next: Error message for rayleighchan