From: geo on 14 Apr 2010 06:41 The KISS (Keep-It Simple-Stupid) principle appears to be one of the better ways to get fast and reliable random numbers in a computer, by combining several simple methods. Most RNGs produce random integers---usually 32-bit--- that must be converted to floating point numbers for applications. I offer here a posting describing my latest version of a KISS RNG that produces double-precision uniform random variables directly, without need for the usual integer-to-float conversion. It is an extension of a method I provided for MATLAB, with details described in G.Marsaglia and W.W.Tsang, The 64-bit universal RNG, Statistics & Probability Letters, V66, Issue 2,2004. Rather than combining simple lagged Fibonacci and Weyl sequences, as in those versions, this latest version combines a lag-1220 CSWB (Complimentary-Subtract-With-Borrow) sequence, x[n]=x[n-1220-x[n-1190]-borrow \mod b=2^53, period >10^19462 with a lag-2 SWB (Subtract-With-Borrow) sequence, z[n]=z[n-1]-z[n-2]-borrow mod b=2^53. period>10^30. Details on how to maintain exact integer arithmetic with only float operations are in the article above. This double-precision RNG, called dUNI, has an immense period, around 10^19492 versus a previous 10^61, requires just a few lines of code, is very fast, some 70 million per second, and ---a key feature---behaves very well on tests of randomness. The extensive tests of The Diehard Battery, designed for 32-bit random integers, are applied to bits 1 to 32, then 2 to 33, then 3 to 34,...,then finally bits 21 to 53 for each part of the string of 53 bits in each double-precision float dUNI(). A C version is listed below, but as only tests of magnitude and floating point addition or subtraction are required, implementations in other languages could be made available. I invite such from interested readers. If you cut, paste, compile and run the following C code, it should take around 14 seconds to generate 1000 million dUNI's, followed by the output 0.6203646342357479. ------------------------------------------------------------ /* dUNI(), a double-precision uniform RNG based on the KISS (Keep-It-Simple-Stupid) principle, combining, by subtraction mod 1, a CSWB (Complimentary-Subtract-With-Borrow) integer sequence x[n]=x[n-1220]-x[n-1190]-borrow mod b=2^53, with a SWB (Subtract-With-Borrow) sequence z[n]=z[n-2]-z[n-1]-borrow mod b=2^53. Both implemented as numerators of double precision rationals, t for CSWB, zy for SWB, each formed as (integer mod 2^53)/2^53, leading to a returned KISS value t-zw mod 1. All denominators floats of b=2^53. The CSWB sequence is based on prime p=b^1220-b^1190+1, for which the order of b mod p is (p-1)/72, so the CSWB sequence has period >2^64653 or 10^19462. The SWB-lag2 sequence z[n]=z[n-2]-z[n-1]-zc mod b=2^53, is based on b^2-b-1 = 11*299419*24632443746239056514780519 with period 10*299418*(24632443746239056514780518/2) > 2^101. cc is the CSWB and SWB borrow or carry value: cc=1.0/b; */ #include <stdio.h> static double Q[1220]; static int indx=1220; double dUNI() { /* Takes 14 nanosecs, Intel Q6600,2.40GHz */ const double cc=1.0/9007199254740992.0; static double c=0.0,zc=0.0, /*current CSWB and SWB `borrow`*/ zx=5212886298506819.0/9007199254740992.0, /*SWB seed1*/ zy=2020898595989513.0/9007199254740992.0; /*SWB seed2*/ int i,j; double t; /*t: first temp, then next CSWB value*/ /* First get zy as next lag-2 SWB */ t=zx-zy-zc; zx=zy; if(t<0){zy=t+1.0; zc=cc;} else{zy=t; zc=0.0;} /*Then get t as the next lag-1220 CSWB value */ if(indx<1220) t=Q[indx++]; else /*refill Q[n] via Q[n-1220]-Q[n-1190]-c, */ { for(i=0;i<1220;i++) {j=(i<30) ? i+1190 : i-30; t=Q[j]-Q[i]+c; /*Get next CSWB element*/ if(t>0) {t=t-cc; c=cc;} else {t=t-cc+1.0; c=0.0;} Q[i]=t; } /* end i loop */ indx=1; t=Q[0]; /*set indx, exit 'else' with t=Q[0] */ } /* end else segment; return t-zy mod 1 */ return( (t<zy)? 1.0+(t-zy) : t-zy ) ; } /* end dUNI() */ int main() /*You must provide at least two 32-bit seeds*/ { int i,j; double s,t; /*Choose 32 bits for x, 32 for y */ unsigned long x=123456789, y=362436069; /*default seeds*/ /* Next, seed each Q[i], one bit at a time, */ for(i=0;i<1220;i++) /*using 9th bit from Cong+Xorshift*/ {s=0.0; t=1.0; for(j=0;j<52;j++) {t=0.5*t; /* make t=.5/2^j */ x=69069*x+123; y^=(y<<13);y^=(y>>17);y^=(y<<5); if(((x+y)>>23)&1) s=s+t; /*change bit of s, maybe */ } /* end j loop*/ Q[i]=s; } /*end i seed loop, Now generate 10^9 dUNI's: */ for(i=0;i<1000000000;i++) t=dUNI(); printf("%.16f\n",dUNI()); } /* Output, after 10^9 random uniforms: 0.6203646342357479 Should take about 14 seconds, e.g. with gcc -O3 opimizing compiler -------------------------------------------------------- */ /* Fully seeding the Q[1220] array requires 64660 seed bits, plus another 106 for the initial zx and zy of the lag-2 SWB sequence. As in the above listing, using just two 32-bit seeds, x and y in main(), to initialize the Q array, one bit at a time, by means of the resulting Congruential+Xorshift sequence may be enough for many applications. Applications such as in Law or Gaming may require enough seeds in the Q[1220] array to guarantee that each one of a huge set of possible outcomes can appear. For example, choosing a jury venire of 80 from a list of 2000 eligibles would require at least ten 53-bit seeds; choosing 180 from 4000 would require twenty 53-bit seeds. To get certification, a casino machine that could play forty simultaneous games of poker must be able to produce forty successive straight-flushes, with a resulting minimal seed set. Users can choose their 32-bit x,y for the above seeding process, or develop their own for more exacting requirements when a mere set of 64 seed bits may not be enough. Properties: 1. Simple and very fast, some 70 million double-precision uniform(0,1] random variables/second, in IEEE 754 standard form: 1 sign bit, 11 exponent bits, 52 fraction bits plus the implied 1 leading the fraction part, making a total of 53 bits for each uniform floating-point variate. 2. Period some 10^19492 (vs. the 10^61 of an earlier version), G. Marsaglia and W.W. Tsang, "The 64-bit universal RNG",(2004) Statistics and Probability Letters}, v66, no. 2, 183-187, or an earlier version provided for Matlab.) 3. Following the KISS, (Keep-It-Simple-Stupid) principle, combines, via subtraction, a CSWB sequence x[n]=x[n-1220]-x[n-1190]-borrow mod 2^53 based on the prime p=b^1220-b^1190+1, b=2^53, with a SWB lag-2 sequence z[n]=w[n-2]-z[n-1] -c mod 2^53. All modular arithmetic on numerators of rationals with denominators 2.^53. 4. Easily implemented in various programming languages, as only tests on magnitude and double-precision floating-point subtraction or addition required. 5. Passes all Diehard tests, http://i.cs.hku.hk/~diehard/ As Diehard is designed for 32-bit integer input, all of its 234 tests are applied to bits 1..32, then 2..33, then 3..34,.., then 22..53 of the 53 bits in each dUNI(). (To form 32-bit integer j from bits i..(i+31): t=dUNI()*(1<<(i-1)); i=t; j=(t-i)*4294967296.0; ) All twenty-two choices for the 32-bit segments performed very well on the 234 p-values from Diehard tests, using the default 32-bit seeds x and y. You might try your own, with possibly more extensive seeding of the Q array. */ George Marsaglia
From: James Waldby on 14 Apr 2010 13:40 On Wed, 14 Apr 2010 03:41:05 -0700, geo (George Marsaglia) wrote: [I snipped your comments and the code of dUNI double-precision uniform random variates generator and simple test, except for the following bits of main] > int main() /*You must provide at least two 32-bit seeds*/ .... > unsigned long x=123456789, y=362436069; /*default seeds*/ .... > for(...) ... > x=69069*x+123; y^=(y<<13);y^=(y>>17);y^=(y<<5); .... > Output, after 10^9 random uniforms: > 0.6203646342357479 > Should take about 14 seconds, > e.g. with gcc -O3 opimizing compiler .... 1. On my 2-GHz machine, runs take about 19 seconds. When you give a run time, perhaps give a reference to machine speed also. 2. With code compiled as is, the result I get is 0.9250927935120845 on an Athlon 64 X2 5200+ cpu. If I substitute 'int' in place of 'long' in the x, y declaration quoted above, I get the result you gave, 0.6203646342357479. Perhaps everyone in comp.lang.c knows of clean ways to fix that declaration, if any exist. A brute force fix would be to AND each operand and result in the x,y calcs in the for loop with a 32-bit mask. 3. Compiling with -Wall turned on gives warning that control reaches end of non-void function (main). So put a 'return 0' or similar at the end of main. -- jiw
From: Dann Corbit on 14 Apr 2010 14:06 For reference and comparison, here is a "down and dirty" translation into a C class. I guess that some C++ guru will take a few minutes to shine the apple a bit so that it can be genuinely useful. /* From: geo <gmarsaglia(a)gmail.com> Newsgroups: sci.math,comp.lang.c Subject: RNGs: A double KISS Date: Wed, 14 Apr 2010 03:41:05 -0700 (PDT) Xref: eternal-september.org sci.math:94533 comp.lang.c:57850 The KISS (Keep-It Simple-Stupid) principle appears to be one of the better ways to get fast and reliable random numbers in a computer, by combining several simple methods. Most RNGs produce random integers---usually 32-bit--- that must be converted to floating point numbers for applications. I offer here a posting describing my latest version of a KISS RNG that produces double-precision uniform random variables directly, without need for the usual integer-to-float conversion. It is an extension of a method I provided for MATLAB, with details described in G.Marsaglia and W.W.Tsang, The 64-bit universal RNG, Statistics & Probability Letters, V66, Issue 2,2004. Rather than combining simple lagged Fibonacci and Weyl sequences, as in those versions, this latest version combines a lag-1220 CSWB (Complimentary-Subtract-With-Borrow) sequence, x[n]=x[n-1220-x[n-1190]-borrow \mod b=2^53, period >10^19462 with a lag-2 SWB (Subtract-With-Borrow) sequence, z[n]=z[n-1]-z[n-2]-borrow mod b=2^53. period>10^30. Details on how to maintain exact integer arithmetic with only float operations are in the article above. This double-precision RNG, called dUNI, has an immense period, around 10^19492 versus a previous 10^61, requires just a few lines of code, is very fast, some 70 million per second, and ---a key feature---behaves very well on tests of randomness. The extensive tests of The Diehard Battery, designed for 32-bit random integers, are applied to bits 1 to 32, then 2 to 33, then 3 to 34,...,then finally bits 21 to 53 for each part of the string of 53 bits in each double-precision float dUNI(). A C version is listed below, but as only tests of magnitude and floating point addition or subtraction are required, implementations in other languages could be made available. I invite such from interested readers. If you cut, paste, compile and run the following C code, it should take around 14 seconds to generate 1000 million dUNI's, followed by the output 0.6203646342357479. ------------------------------------------------------------ */ /* dUNI(), a double-precision uniform RNG based on the KISS (Keep-It-Simple-Stupid) principle, combining, by subtraction mod 1, a CSWB (Complimentary-Subtract-With-Borrow) integer sequence x[n]=x[n-1220]-x[n-1190]-borrow mod b=2^53, with a SWB (Subtract-With-Borrow) sequence z[n]=z[n-2]-z[n-1]-borrow mod b=2^53. Both implemented as numerators of double precision rationals, t for CSWB, zy for SWB, each formed as (integer mod 2^53)/2^53, leading to a returned KISS value t-zw mod 1. All denominators floats of b=2^53. The CSWB sequence is based on prime p=b^1220-b^1190+1, for which the order of b mod p is (p-1)/72, so the CSWB sequence has period >2^64653 or 10^19462. The SWB-lag2 sequence z[n]=z[n-2]-z[n-1]-zc mod b=2^53, is based on b^2-b-1 = 11*299419*24632443746239056514780519 with period 10*299418*(24632443746239056514780518/2) > 2^101. cc is the CSWB and SWB borrow or carry value: cc=1.0/b; */ #include <cstdio> #include <cstdlib> #include <cassert> class kiss2 { private: double Q[1220]; int indx; double cc; double c; /* current CSWB */ double zc; /* current SWB `borrow` */ double zx; /* SWB seed1 */ double zy; /* SWB seed2 */ size_t qlen;/* length of Q array */ public: kiss2() { assert(sizeof (double) >= 8); cc = 1.0 / 9007199254740992.0; // inverse of 2^53rd power int i; size_t qlen = indx = sizeof Q / sizeof Q[0]; for (i = 0; i < qlen; i++) Q[i] = 0; double c = 0.0, zc = 0.0, /* current CSWB and SWB `borrow` */ zx = 5212886298506819.0 / 9007199254740992.0, /* SWB seed1 */ zy = 2020898595989513.0 / 9007199254740992.0; /* SWB seed2 */ int j; double s, t; /* Choose 32 bits for x, 32 for y */ unsigned long x = 123456789, y = 362436069; /* default seeds * / /* Next, seed each Q[i], one bit at a time, */ for (i = 0; i < qlen; i++) { /* using 9th bit from Cong+Xorshift */ s = 0.0; t = 1.0; for (j = 0; j < 52; j++) { t = 0.5 * t; /* make t=.5/2^j */ x = 69069 * x + 123; y ^= (y << 13); y ^= (y >> 17); y ^= (y << 5); if (((x + y) >> 23) & 1) s = s + t; /* change bit of s, maybe */ } /* end j loop */ Q[i] = s; } /* end i seed loop, Now generate 10^9 dUNI's: */ } double dUNI() { /* Takes 14 nanosecs, Intel Q6600,2.40GHz */ int i, j; double t; /* t: first temp, then next CSWB value */ /* First get zy as next lag-2 SWB */ t = zx - zy - zc; zx = zy; if (t < 0) { zy = t + 1.0; zc = cc; } else { zy = t; zc = 0.0; } /* Then get t as the next lag-1220 CSWB value */ if (indx < 1220) t = Q[indx++]; else { /* refill Q[n] via Q[n-1220]-Q[n-1190]-c, */ for (i = 0; i < 1220; i++) { j = (i < 30) ? i + 1190 : i - 30; t = Q[j] - Q[i] + c; /* Get next CSWB element */ if (t > 0) { t = t - cc; c = cc; } else { t = t - cc + 1.0; c = 0.0; } Q[i] = t; } /* end i loop */ indx = 1; t = Q[0]; /* set indx, exit 'else' with t=Q[0] */ } /* end else segment; return t-zy mod 1 */ return ((t < zy) ? 1.0 + (t - zy) : t - zy); } /* end dUNI() */ }; int main(void) { /* You must provide at least two 32-bit seeds */ kiss2 krng; double t; int i; for (i = 0; i < 1000000000; i++) t = krng.dUNI (); printf ("%.16f\n", krng.dUNI ()); } /* Output, after 10^9 random uniforms: 0.6203646342357479 Should take about 14 seconds, e.g. with gcc -O3 opimizing compiler -------------------------------------------------------- */ /* Fully seeding the Q[1220] array requires 64660 seed bits, plus another 106 for the initial zx and zy of the lag-2 SWB sequence. As in the above listing, using just two 32-bit seeds, x and y in main(), to initialize the Q array, one bit at a time, by means of the resulting Congruential+Xorshift sequence may be enough for many applications. Applications such as in Law or Gaming may require enough seeds in the Q[1220] array to guarantee that each one of a huge set of possible outcomes can appear. For example, choosing a jury venire of 80 from a list of 2000 eligibles would require at least ten 53-bit seeds; choosing 180 from 4000 would require twenty 53-bit seeds. To get certification, a casino machine that could play forty simultaneous games of poker must be able to produce forty successive straight-flushes, with a resulting minimal seed set. Users can choose their 32-bit x,y for the above seeding process, or develop their own for more exacting requirements when a mere set of 64 seed bits may not be enough. Properties: 1. Simple and very fast, some 70 million double-precision uniform(0,1] random variables/second, in IEEE 754 standard form: 1 sign bit, 11 exponent bits, 52 fraction bits plus the implied 1 leading the fraction part, making a total of 53 bits for each uniform floating-point variate. 2. Period some 10^19492 (vs. the 10^61 of an earlier version), G. Marsaglia and W.W. Tsang, "The 64-bit universal RNG",(2004) Statistics and Probability Letters}, v66, no. 2, 183-187, or an earlier version provided for Matlab.) 3. Following the KISS, (Keep-It-Simple-Stupid) principle, combines, via subtraction, a CSWB sequence x[n]=x[n-1220]-x[n-1190]-borrow mod 2^53 based on the prime p=b^1220-b^1190+1, b=2^53, with a SWB lag-2 sequence z[n]=w[n-2]-z[n-1] -c mod 2^53. All modular arithmetic on numerators of rationals with denominators 2.^53. 4. Easily implemented in various programming languages, as only tests on magnitude and double-precision floating-point subtraction or addition required. 5. Passes all Diehard tests, http://i.cs.hku.hk/~diehard/ As Diehard is designed for 32-bit integer input, all of its 234 tests are applied to bits 1..32, then 2..33, then 3..34,.., then 22..53 of the 53 bits in each dUNI(). (To form 32-bit integer j from bits i..(i+31): t=dUNI()*(1<<(i-1)); i=t; j=(t-i)*4294967296.0; ) All twenty-two choices for the 32-bit segments performed very well on the 234 p-values from Diehard tests, using the default 32-bit seeds x and y. You might try your own, with possibly more extensive seeding of the Q array. -- George Marsaglia */
From: Jonathan Lee on 14 Apr 2010 14:53 On Apr 14, 2:06 pm, Dann Corbit <dcor...(a)connx.com> wrote: Thanks for sharing. Isn't this dangerous for y: > for (j = 0; j < 52; j++) > { > t = 0.5 * t; /* make t=.5/2^j */ > x = 69069 * x + 123; > y ^= (y << 13); > y ^= (y >> 17); > y ^= (y << 5); > if (((x + y) >> 23) & 1) > s = s + t; /* change bit of s, maybe */ > } /* end j loop */ Specifically (y >> 17) could mix in high bits if unsigned long were 64 bit instead of (presumably) 32-bit. Just at a quick glance, I would think the operations on y would need to be masked to be portable. --Jonathan
From: Jonathan Lee on 14 Apr 2010 15:07
On Apr 14, 2:06 pm, Dann Corbit <dcor...(a)connx.com> wrote: > For reference and comparison, here is a "down and dirty" translation > into a C class. > > I guess that some C++ guru will take a few minutes to shine the apple a > bit so that it can be genuinely useful. Another thought: you could probably remove that assertion by building the double from LSb to MSb. You'd essentially be bitreversing the number which probably won't affect the randomness (can't prove it off hand). If double didn't have enough bits, the machine would simply round the value as you went along. Not ideal under some RNG scenarios, but the result should be the "same" as the actual number up to machine precision. --Jonathan |