From: Allan W on 15 Dec 2005 18:22 > Diego Martins wrote: > > double zero_one_rand() > > { > > double r = static_cast<double>(rand()) / rand(); > > return r <= 1 ? r : 1/r; > > } kanze wrote: > Would this result in a linear distribution? (I'm too lazy to do > the analysis, but a priori, I'm very suspicious of anything that > involves division by rand().) I did the analysis, expecting it to be biased. I didn't literally run the function above, to do the analysis. Instead, I used: int main() { for (double n=0.0; n<1.0; n+=0.01) for (double d=0.0; d<1.0; d+=0.01) if (d>0.0) std::cout << (n/d) << std::endl; else std::cout << "Err\n"; } Then I sorted the output, took the inverse of the values greater than 1.0, and sorted again. I was surprised. Other than the "Err" outputs, the results were pretty evenly distributed. The line graph of the results were practically a straight line. There was a *Slight* bias towards either zero or one. > It also fails to meet the requirements if rand() returns the > same value twice in a row. Yes, we're not supposed to be able to get the value 1.0. And the obvious "fix" of return r <= 1 ? r : 1/r; wouldn't help if the value was exactly 1.0. > Not to mention a small problem if > the second call to rand() returns 0. (Did I mention that I'm > suspicious of this sort of division?) That's true too. So the revised version would look like: double rand_double() { double n,r; do { n=rand(); d=rand(); } while (n>=d); return n/d; } This one also has a slight bias towards zero. (Zero seems to be the result about twice as often as any other number.) Other than that, the results are very linear. [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: kanze on 16 Dec 2005 11:46 Allan W wrote: > > Diego Martins wrote: > > > double zero_one_rand() > > > { > > > double r = static_cast<double>(rand()) / rand(); > > > return r <= 1 ? r : 1/r; > > > } > kanze wrote: > > Would this result in a linear distribution? (I'm too lazy > > to do the analysis, but a priori, I'm very suspicious of > > anything that involves division by rand().) > I did the analysis, expecting it to be biased. That's more or less my expectations as well. > I didn't literally run the function above, to do the analysis. > Instead, I used: > int main() { > for (double n=0.0; n<1.0; n+=0.01) > for (double d=0.0; d<1.0; d+=0.01) > if (d>0.0) std::cout << (n/d) << std::endl; > else std::cout << "Err\n"; > } > Then I sorted the output, took the inverse of the values > greater than 1.0, and sorted again. That's not analyse, that simulation. Still, it should reveal the worst errors. Of course, you're program is not exactly what he was doing. His "n" and "d" were integer values, converted to double. I tried the following: int main() { int count[ 11 ] ; std::fill_n( count, 11, 0 ) ; for ( int i = 0 ; i < 10 ; ++ i ) { double n = (double)i ; for ( int j = 1 ; j < 10 ; ++ j ) { double d = (double)j ; double r = n / d ; if ( r >= 1.0 ) { r = 1.0 / r ; } ++ count[ (int)(10.0 * r) ] ; } } for ( int i = 0 ; i < 11 ; ++ i ) { double d = (double)i ; std::cout << d << ": " << count[ i ] << std::endl ; } return 0 ; } I think this pretty much simulates what he was doing, with RAND_MAX == 10. The output was: 0: 9 1: 8 2: 10 3: 8 4: 6 5: 12 6: 10 7: 8 8: 10 9: 0 10: 9 Given that this is basically a test of the full period, we should have exactly the same value in 0-9, and 0 in 10. In fact, we are very far from it. Curiously, a similar test using the actual random number generator on my system ( double n = (double)rand() ; double d = (double)rand() ; double r = n / d ; if ( r >= 1.0 ) { r = 1.0 / r ; } ++ count[ (int)(10.0 * r) ] ; repeated 100000 times) shows apparently good results: all values between 9800 and 10200, with a lot of variation as to which counts are above 10000, and which are below it, according to the seed. At least for the counts for 0 through 9. Obviously not a proof, either, nor analysis, but it does suggest that my suspicions might be unfounded. > I was surprised. Other than the "Err" outputs, the results > were pretty evenly distributed. The line graph of the results > were practically a straight line. There was a *Slight* bias > towards either zero or one. I'm not sure what you were measuring here. The line graph of what results? > > It also fails to meet the requirements if rand() returns the > > same value twice in a row. > Yes, we're not supposed to be able to get the value 1.0. > And the obvious "fix" of > return r <= 1 ? r : 1/r; > wouldn't help if the value was exactly 1.0. Which it will be if the random generator returns the same value twice. My tests with the actual random number generator on my system shows that (int)(10.0 * r) is occasionally >= 1; for 100000 trials, I get between 1 and 4 in count[10] in my test above. > > Not to mention a small problem if the second call to rand() > > returns 0. (Did I mention that I'm suspicious of this sort > > of division?) > That's true too. Given that a typical linear congruent generator will return a range of 1...N, and that rand() has probably subracted one from this to get it's required range (which starts at 0), adding one to the results of rand() should be a good solution. Both of these problems are easily coded around. Does the work-around bias the results? To tell the truth, I don't know, I'm suspicious that it might, but probably not to a really measurable degree. > So the revised version would look like: > double rand_double() { > double n,r; > do { > n=rand(); > d=rand(); > } while (n>=d); > return n/d; > } > This one also has a slight bias towards zero. (Zero seems to > be the result about twice as often as any other number.) Other > than that, the results are very linear. This one throws out about half the values. But what I want isn't experimental data. I'm more interested in actual analysis. I think my first test, above, comes the closest to simulating a full cycle, and it makes me very suspicious. Unless I've done something wrong, or misunderstood something, the counts there should be exactly equal, and they aren't. On the other hand, the pattern they show suggests that the problem won't be immediately recognizable for larger values of RAND_MAX. -- James Kanze GABI Software Conseils en informatique orient?e objet/ Beratung in objektorientierter Datenverarbeitung 9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34 [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Allan W on 19 Dec 2005 19:47 kanze wrote: > Allan W wrote: > > > Diego Martins wrote: > > > > double zero_one_rand() > > > > { > > > > double r = static_cast<double>(rand()) / rand(); > > > > return r <= 1 ? r : 1/r; > > > > } > > > kanze wrote: > > > Would this result in a linear distribution? (I'm too lazy > > > to do the analysis, but a priori, I'm very suspicious of > > > anything that involves division by rand().) > > > I did the analysis, expecting it to be biased. > > That's more or less my expectations as well. > > > I didn't literally run the function above, to do the analysis. > > Instead, I used: > > > int main() { > > for (double n=0.0; n<1.0; n+=0.01) > > for (double d=0.0; d<1.0; d+=0.01) > > if (d>0.0) std::cout << (n/d) << std::endl; > > else std::cout << "Err\n"; > > } > > > Then I sorted the output, took the inverse of the values > > greater than 1.0, and sorted again. > > That's not analyse, that simulation. Still, it should reveal > the worst errors. Hmm... > Of course, you're program is not exactly what he was doing. His > "n" and "d" were integer values, converted to double. Good point. I missed that. I doubt he meant to do that, though. [Snip] > > I was surprised. Other than the "Err" outputs, the results > > were pretty evenly distributed. The line graph of the results > > were practically a straight line. There was a *Slight* bias > > towards either zero or one. > > I'm not sure what you were measuring here. The line graph of > what results? I graphed the return values verses the number of times each value was seen. The graph was very nearly linear, meaning (for instance) that the value 0.35 was seen almost exactly the same number of times as the value 0.36. > My tests with the actual random number generator on my system > shows that (int)(10.0 * r) is occasionally >= 1; for 100000 > trials, I get between 1 and 4 in count[10] in my test above. I'm surprised! That's not supposed to EVER happen, is it? > > double rand_double() { > > double n,r; > > do { > > n=rand(); > > d=rand(); > > } while (n>=d); > > return n/d; > > } > > > This one also has a slight bias towards zero. (Zero seems to > > be the result about twice as often as any other number.) Other > > than that, the results are very linear. > > This one throws out about half the values. True. But I was trying to avoid the n==d case. I suppose we could change the while() loop to while (n==d) and change the return to return (n>d) ? (n/d) : (d/n); but I haven't analyzed ... er, simulated that. :-) > But what I want isn't experimental data. I'm more interested in > actual analysis. I think my first test, above, comes the > closest to simulating a full cycle, and it makes me very > suspicious. But I think I did a full cycle too, and got pretty different results. I'll try to take a closer look when I have more time... [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: sstump on 20 Dec 2005 05:47 > > But what I want isn't experimental data. I'm more interested in > actual analysis. I think my first test, above, comes the > closest to simulating a full cycle, and it makes me very > suspicious. Unless I've done something wrong, or misunderstood > something, the counts there should be exactly equal, and they > aren't. On the other hand, the pattern they show suggests that > the problem won't be immediately recognizable for larger values > of RAND_MAX. > > -- > James Kanze GABI Software > Conseils en informatique orient?e objet/ > Beratung in objektorientierter Datenverarbeitung > 9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34 Interesting questions... I believe your suspicions are well placed. To further confuse the issue, I measured the mean of your variable 'r' in your 'full cycle program', and it was 0.5. Which is exactly where it should be for a uniform random variable in [0,1). Looks good right? Not so. Look at http://mathworld.wolfram.com/RatioDistribution.html The last equation on that page shows the CDF for the ratio of two uniform random variables U = X/Y. It doesn't apply exactly as written, as we are inverting if U > 1, but it tells me that the ratio is not uniform. Curiously, if we find the mean using the CDF given (evaluate for u = 1/2, that is half the values of U < 1/2, half above) we get mean = 1/2 (independent of the parameters of the uniform distribution of X and Y, as long as they are the same), as your program predicted. [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
From: Keith H Duggar on 20 Dec 2005 22:25 kanze wrote: > Obviously not a proof, either, nor analysis, but it does suggest > that my suspicions might be unfounded. If u1 and u2 are uniform random variables between 0 and 1 and r=u1/u2 then r is a random variable with density p(r) = 1/2 : 0 <= r < 1 p(r) = 1/(2*r^2) : 1 <= r and applying the transform r' = r : 0 <= r < 1 r' = 1/r : 1 <= r yields r' a uniform deviate p(r) = 1 : 0 <= r <= 1 since for r >= 1 p(r') = p(r)/|dr'/dr| p(r') = 1/(2*r^2)/|-1/r^2| p(r') = (r^2)/(2*r^2) p(r') = 1/2 So using the ratio-of-uniform distribution and taking the inverse (or ignoring) values greater than 1 does indeed give you back a uniform distribution. However, consider that we must ALREADY have had the ability to generate uniform deviates in order to use this method. So what have we gained by performing these transformations? Perhaps nothing or perhaps we have lowered the quality of our uniform deviates at the expense of an extra division (since if we are dividing by a constant link RAND_MAX we can precompute the reciprocal which we cannot do with the random variable u2). Anyhow there are papers written about using ratio-of-uniform deviates with LCGs so it's worth looking those up if one is considering using this method. Of course there are a number of caveats which can add non-uniformity as the derivation I gave applies to real numbers and not rational uniform deviates as we actually have. [ See http://www.gotw.ca/resources/clcm.htm for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: Portable way to write binary data Next: Tree container library |