how to generate random variable with constraint? [Matlab]

Prev: Speed bottleneck
Next: Error message for rayleighchan

From: Matt J on 20 Jul 2010 12:17

Walter Roberson <roberson(a)hushmail.com> wrote in message <YKj1o.92592$Lj2.82402(a)newsfe05.iad>...

> You are correct, the OP placed no such restriction in the question,
> including not requiring that the random numbers be drawn from a uniform
> random distribution.
===========

Assuming he was, though, I'm still wondering if the following would do it. For larger numbers of variables, it would be good to have a way of doing this without using
sort().

A=cumsum(rand(1,6)); A=A/sum(A); A(end)=[];

From: Walter Roberson on 20 Jul 2010 12:24

Matt J wrote:
> Walter Roberson <roberson(a)hushmail.com> wrote in message
> <YKj1o.92592$Lj2.82402(a)newsfe05.iad>...
>
>> You are correct, the OP placed no such restriction in the question,
>> including not requiring that the random numbers be drawn from a
>> uniform random distribution.
> ===========
>
> Assuming he was, though, I'm still wondering if the following would do
> it. For larger numbers of variables, it would be good to have a way of
> doing this without using sort().
>
> A=cumsum(rand(1,6)); A=A/sum(A); A(end)=[];

I would tend to doubt that that would work to generate well-distributed
points on the simplex. The fundamental problem with using the sum
approach is that even though any one A(K) value is independent, as you
add them together, the sum approaches the normal distribution, as per
the Central Limit Theorem, and so the generated A vectors would tend to
cluster towards the centroid of the simplex. I don't see at the moment
how generating an extra value and discarding would resolve that problem.

From: someone on 20 Jul 2010 12:37

"Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message <i24i60$gfp$1(a)fred.mathworks.com>...
> Walter Roberson <roberson(a)hushmail.com> wrote in message <YKj1o.92592$Lj2.82402(a)newsfe05.iad>...
>
> > You are correct, the OP placed no such restriction in the question,
> > including not requiring that the random numbers be drawn from a uniform
> > random distribution.
> ===========
>
> Assuming he was, though, I'm still wondering if the following would do it. For larger numbers of variables, it would be good to have a way of doing this without using
> sort().
>
> A=cumsum(rand(1,6)); A=A/sum(A); A(end)=[];

Wow, I have to admit that I didn't put a lot of thought into my inital solution.
I simply reasoned that the constraint that a<b<c<d<e was really no constraint at all.
Using sort was (in my mind) just a way of "relabeling" the a, b c, d, & e variables.
The only "gotcha" would be if rand returned an equality (whiched seemed like
a pretty unlikely event with an "easy" fix). Did I miss something?

From: Matt J on 20 Jul 2010 12:59

Walter Roberson <roberson(a)hushmail.com> wrote in message <K4k1o.93283$Lj2.50698(a)newsfe05.iad>...

> I would tend to doubt that that would work to generate well-distributed
> points on the simplex. The fundamental problem with using the sum
> approach is that even though any one A(K) value is independent, as you
> add them together, the sum approaches the normal distribution, as per
> the Central Limit Theorem, and so the generated A vectors would tend to
> cluster towards the centroid of the simplex. I don't see at the moment
> how generating an extra value and discarding would resolve that problem.
==================

I'm not really seeing that argument. We have A/sum(A).
The central limit theorem says that A(end) ---> (randn+.5)*sqrt(N)
The law of large numbers says that sum(A)---> 0.5*N

This means that the A(end)/sum(A)--->0 as N-->inf, as we would expect it to.

From: John D'Errico on 20 Jul 2010 13:10

"Matt J " <mattjacREMOVE(a)THISieee.spam> wrote in message <i24i60$gfp$1(a)fred.mathworks.com>...
> Walter Roberson <roberson(a)hushmail.com> wrote in message <YKj1o.92592$Lj2.82402(a)newsfe05.iad>...
>
> > You are correct, the OP placed no such restriction in the question,
> > including not requiring that the random numbers be drawn from a uniform
> > random distribution.
> ===========
>
> Assuming he was, though, I'm still wondering if the following would do it. For larger numbers of variables, it would be good to have a way of doing this without using
> sort().
>
> A=cumsum(rand(1,6)); A=A/sum(A); A(end)=[];

This is indeed massively biased! To convince yourself
that it does not produce a random sampling, or even
the correct sampling of the required domain, try it in
2 dimensions!

n = 10000;
A = cumsum(rand(n,3),2);
A = bsxfun(@rdivide,A,sum(A,2));
A(:,3) = [];
plot(A(:,1),A(:,2),'.')

The domain of interest here SHOULD be a triangle,
but not the one shown. Instead, try this:

B = sort(rand(n,2),2);
plot(B(:,1),B(:,2),'.')

I don't even see the sort as more complex, nor does
MATLAB. Try this:

n = 1000000;
tic
A = cumsum(rand(n,6),2);
A = bsxfun(@rdivide,A,sum(A,2));
A(:,6) = [];
toc
Elapsed time is 0.219178 seconds.

tic
B = sort(rand(n,5),2);
toc
Elapsed time is 0.163978 seconds.

See that the sort took LESS time than the cumsum,
and the sort is verifiably correct.

John

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: Speed bottleneck
Next: Error message for rayleighchan