From: John Stone on
I hope this is a simple question (perhaps too simple for the
archives, which I did search). Thanks in advance for any help.

I am trying to use RandomReal[ ] to sample from bins of different
widths that span the interval 0 - 1. The bin widths represent the
weights I'm assigning to a family of trial solutions in an
optimization problem. The aim is to sample the solutions in
proportion to their weights using a uniform distribution of random
numbers generated by RandomReal[ ].

For a simple example, however, suppose there are 10 equally weighted
solutions. My selection process would use some code that looks like:

weights = Table[0.1, {10}];
bins = Accumulate[weights];
Select[bins, (# >= RandomReal[] &)][[1]]

Assuming the result of RandomReal[ ] is uniformly distributed, I
expected this to return 0.1 as frequently as it returns 0.5 or 1, but
it seems to return a distribution of values peaked around 0.5 (and
seldom returns 0.9 or 1).

Graphically, the following I think is equivalent, and gives me a
peaked, not a flat distribution:

Histogram[
Table[
Select[{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1},
(# >= RandomReal[ ] &)][[1]],
{1000}]
]

Shouldn't the procedure in my example generate a flat distribution?
Or am I missing something really simple?

Thanks again if you can help me solve this.
--
John Stone
Department of Earth and Space Sciences
University of Washington Phone: + 1 206 221-6332
Box 351310, 70 Johnson Hall Lab: + 1 206 221-6383
Seattle, WA, 98195-1310, USA Fax: + 1 206 543-0489

Web page: http://depts.washington.edu/cosmolab

From: Peter Pein on
Am Fri, 25 Jun 2010 11:26:16 +0000 (UTC)
schrieb John Stone <stone(a)geology.washington.edu>:

> I hope this is a simple question (perhaps too simple for the
> archives, which I did search). Thanks in advance for any help.
>
> I am trying to use RandomReal[ ] to sample from bins of different
> widths that span the interval 0 - 1. The bin widths represent the
> weights I'm assigning to a family of trial solutions in an
> optimization problem. The aim is to sample the solutions in
> proportion to their weights using a uniform distribution of random
> numbers generated by RandomReal[ ].
>
> For a simple example, however, suppose there are 10 equally weighted
> solutions. My selection process would use some code that looks like:
>
> weights = Table[0.1, {10}];
> bins = Accumulate[weights];
> Select[bins, (# >= RandomReal[] &)][[1]]
>
> Assuming the result of RandomReal[ ] is uniformly distributed, I
> expected this to return 0.1 as frequently as it returns 0.5 or 1, but
> it seems to return a distribution of values peaked around 0.5 (and
> seldom returns 0.9 or 1).
>
> Graphically, the following I think is equivalent, and gives me a
> peaked, not a flat distribution:
>
> Histogram[
> Table[
> Select[{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1},
> (# >= RandomReal[ ] &)][[1]],
> {1000}]
> ]
>
> Shouldn't the procedure in my example generate a flat distribution?
> Or am I missing something really simple?
>
> Thanks again if you can help me solve this.

it is the call to Select[] which has a not too obvious trap:
Select takes the list bins and compares the first element (0.1) with a
random number (say 0.123). So 0.1 will not be in the result of Select[].
Next it compares 0.2 from the bins-list with _another_ random number
(say 0.3141). =.2 should have been in the result if the random value
were the same.

If you want to do it this way (Select[] is slow, compared to the
following suggestion), try:
(tmp = RandomReal[]; Select[bins, (# >= tmp &)][[1]]).

A simple test shows that this method needs ~6 seconds on my laptop for
10^5 samples:

With[{samples=100000},
BinCounts[Table[tmp=RandomReal[];Select[bins,(#>=tmp&)][[1]],{samples}],{bins}]/samples]//N//Timing

{6.59,{0.10073,0.10007,0.09997,0.09944,0.09995,0.10008,0.09964,0.09939,0.10052}}

Using built in RandomChoice[] is about 3 times faster and has less
program text:

With[{samples=100000},
BinCounts[RandomChoice[weights->bins,samples],{bins}]/samples]//N//Timing

{2.1,{0.1009,0.10192,0.09936,0.10089,0.09974,0.09733,0.09881,0.09986,0.10099}}

Hope that helps,
Peter



 | 
Pages: 1
Prev: Display question
Next: and sampling a distribution