Prev: Display question
Next: and sampling a distribution
From: John Stone on 25 Jun 2010 07:26 I hope this is a simple question (perhaps too simple for the archives, which I did search). Thanks in advance for any help. I am trying to use RandomReal[ ] to sample from bins of different widths that span the interval 0 - 1. The bin widths represent the weights I'm assigning to a family of trial solutions in an optimization problem. The aim is to sample the solutions in proportion to their weights using a uniform distribution of random numbers generated by RandomReal[ ]. For a simple example, however, suppose there are 10 equally weighted solutions. My selection process would use some code that looks like: weights = Table[0.1, {10}]; bins = Accumulate[weights]; Select[bins, (# >= RandomReal[] &)][[1]] Assuming the result of RandomReal[ ] is uniformly distributed, I expected this to return 0.1 as frequently as it returns 0.5 or 1, but it seems to return a distribution of values peaked around 0.5 (and seldom returns 0.9 or 1). Graphically, the following I think is equivalent, and gives me a peaked, not a flat distribution: Histogram[ Table[ Select[{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1}, (# >= RandomReal[ ] &)][[1]], {1000}] ] Shouldn't the procedure in my example generate a flat distribution? Or am I missing something really simple? Thanks again if you can help me solve this. -- John Stone Department of Earth and Space Sciences University of Washington Phone: + 1 206 221-6332 Box 351310, 70 Johnson Hall Lab: + 1 206 221-6383 Seattle, WA, 98195-1310, USA Fax: + 1 206 543-0489 Web page: http://depts.washington.edu/cosmolab
From: Peter Pein on 26 Jun 2010 03:10 Am Fri, 25 Jun 2010 11:26:16 +0000 (UTC) schrieb John Stone <stone(a)geology.washington.edu>: > I hope this is a simple question (perhaps too simple for the > archives, which I did search). Thanks in advance for any help. > > I am trying to use RandomReal[ ] to sample from bins of different > widths that span the interval 0 - 1. The bin widths represent the > weights I'm assigning to a family of trial solutions in an > optimization problem. The aim is to sample the solutions in > proportion to their weights using a uniform distribution of random > numbers generated by RandomReal[ ]. > > For a simple example, however, suppose there are 10 equally weighted > solutions. My selection process would use some code that looks like: > > weights = Table[0.1, {10}]; > bins = Accumulate[weights]; > Select[bins, (# >= RandomReal[] &)][[1]] > > Assuming the result of RandomReal[ ] is uniformly distributed, I > expected this to return 0.1 as frequently as it returns 0.5 or 1, but > it seems to return a distribution of values peaked around 0.5 (and > seldom returns 0.9 or 1). > > Graphically, the following I think is equivalent, and gives me a > peaked, not a flat distribution: > > Histogram[ > Table[ > Select[{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1}, > (# >= RandomReal[ ] &)][[1]], > {1000}] > ] > > Shouldn't the procedure in my example generate a flat distribution? > Or am I missing something really simple? > > Thanks again if you can help me solve this. it is the call to Select[] which has a not too obvious trap: Select takes the list bins and compares the first element (0.1) with a random number (say 0.123). So 0.1 will not be in the result of Select[]. Next it compares 0.2 from the bins-list with _another_ random number (say 0.3141). =.2 should have been in the result if the random value were the same. If you want to do it this way (Select[] is slow, compared to the following suggestion), try: (tmp = RandomReal[]; Select[bins, (# >= tmp &)][[1]]). A simple test shows that this method needs ~6 seconds on my laptop for 10^5 samples: With[{samples=100000}, BinCounts[Table[tmp=RandomReal[];Select[bins,(#>=tmp&)][[1]],{samples}],{bins}]/samples]//N//Timing {6.59,{0.10073,0.10007,0.09997,0.09944,0.09995,0.10008,0.09964,0.09939,0.10052}} Using built in RandomChoice[] is about 3 times faster and has less program text: With[{samples=100000}, BinCounts[RandomChoice[weights->bins,samples],{bins}]/samples]//N//Timing {2.1,{0.1009,0.10192,0.09936,0.10089,0.09974,0.09733,0.09881,0.09986,0.10099}} Hope that helps, Peter
|
Pages: 1 Prev: Display question Next: and sampling a distribution |