RNGs: A double KISS [Math]

Prev: Square Roots and Sequences
Next: Cycles of 1D Dynamical System

From: geo on 14 Apr 2010 06:41

The KISS (Keep-It Simple-Stupid) principle
appears to be one of the better ways to get
fast and reliable random numbers in a computer,
by combining several simple methods.

Most RNGs produce random integers---usually 32-bit---
that must be converted to floating point numbers
for applications. I offer here a posting
describing my latest version of a KISS RNG that
produces double-precision uniform random variables
directly, without need for the usual integer-to-float
conversion. It is an extension of a method I provided
for MATLAB, with details described in
G.Marsaglia and W.W.Tsang, The 64-bit universal RNG,
Statistics & Probability Letters, V66, Issue 2,2004.

Rather than combining simple lagged Fibonacci and Weyl
sequences, as in those versions, this latest version combines
a lag-1220 CSWB (Complimentary-Subtract-With-Borrow) sequence,
x[n]=x[n-1220-x[n-1190]-borrow \mod b=2^53, period >10^19462
with a lag-2 SWB (Subtract-With-Borrow) sequence,
z[n]=z[n-1]-z[n-2]-borrow mod b=2^53. period>10^30.
Details on how to maintain exact integer arithmetic
with only float operations are in the article above.

This double-precision RNG, called dUNI, has an immense period,
around 10^19492 versus a previous 10^61, requires just a few
lines of code, is very fast, some 70 million per second, and
---a key feature---behaves very well on tests of randomness.
The extensive tests of The Diehard Battery, designed for 32-bit
random integers, are applied to bits 1 to 32, then 2 to 33,
then 3 to 34,...,then finally bits 21 to 53 for each part of
the string of 53 bits in each double-precision float dUNI().

A C version is listed below, but as only tests of magnitude
and floating point addition or subtraction are required,
implementations in other languages could be made available.
I invite such from interested readers.

If you cut, paste, compile and run the following C code,
it should take around 14 seconds to generate 1000 million
dUNI's, followed by the output 0.6203646342357479.
------------------------------------------------------------

/*
dUNI(), a double-precision uniform RNG based
on the KISS (Keep-It-Simple-Stupid) principle,
combining, by subtraction mod 1, a CSWB
(Complimentary-Subtract-With-Borrow) integer sequence
x[n]=x[n-1220]-x[n-1190]-borrow mod b=2^53,
with a SWB (Subtract-With-Borrow) sequence
z[n]=z[n-2]-z[n-1]-borrow mod b=2^53.

Both implemented as numerators of double precision
rationals, t for CSWB, zy for SWB, each formed as
(integer mod 2^53)/2^53, leading to a returned KISS
value t-zw mod 1. All denominators floats of b=2^53.

The CSWB sequence is based on prime p=b^1220-b^1190+1,
for which the order of b mod p is (p-1)/72, so the CSWB
sequence has period >2^64653 or 10^19462.

The SWB-lag2 sequence z[n]=z[n-2]-z[n-1]-zc mod b=2^53,
is based on b^2-b-1 = 11*299419*24632443746239056514780519
with period 10*299418*(24632443746239056514780518/2) > 2^101.
cc is the CSWB and SWB borrow or carry value: cc=1.0/b;
*/

#include <stdio.h>
static double Q[1220];
static int indx=1220;

double dUNI()
{ /* Takes 14 nanosecs, Intel Q6600,2.40GHz */
const double cc=1.0/9007199254740992.0;
static double c=0.0,zc=0.0, /*current CSWB and SWB `borrow`*/
zx=5212886298506819.0/9007199254740992.0, /*SWB seed1*/
zy=2020898595989513.0/9007199254740992.0; /*SWB seed2*/
int i,j; double t; /*t: first temp, then next CSWB value*/
/* First get zy as next lag-2 SWB */
t=zx-zy-zc; zx=zy; if(t<0){zy=t+1.0; zc=cc;} else{zy=t; zc=0.0;}

/*Then get t as the next lag-1220 CSWB value */
if(indx<1220) t=Q[indx++];
else /*refill Q[n] via Q[n-1220]-Q[n-1190]-c, */
{ for(i=0;i<1220;i++)
{j=(i<30) ? i+1190 : i-30;
t=Q[j]-Q[i]+c; /*Get next CSWB element*/
if(t>0) {t=t-cc; c=cc;} else {t=t-cc+1.0; c=0.0;}
Q[i]=t;
} /* end i loop */
indx=1; t=Q[0]; /*set indx, exit 'else' with t=Q[0] */
} /* end else segment; return t-zy mod 1 */
return( (t<zy)? 1.0+(t-zy) : t-zy ) ;
} /* end dUNI() */

int main() /*You must provide at least two 32-bit seeds*/
{ int i,j; double s,t; /*Choose 32 bits for x, 32 for y */
unsigned long x=123456789, y=362436069; /*default seeds*/
/* Next, seed each Q[i], one bit at a time, */
for(i=0;i<1220;i++) /*using 9th bit from Cong+Xorshift*/
{s=0.0; t=1.0;
for(j=0;j<52;j++)
{t=0.5*t; /* make t=.5/2^j */
x=69069*x+123; y^=(y<<13);y^=(y>>17);y^=(y<<5);
if(((x+y)>>23)&1) s=s+t; /*change bit of s, maybe */
} /* end j loop*/
Q[i]=s;
} /*end i seed loop, Now generate 10^9 dUNI's: */
for(i=0;i<1000000000;i++) t=dUNI();
printf("%.16f\n",dUNI());
}

/*
Output, after 10^9 random uniforms:
0.6203646342357479
Should take about 14 seconds,
e.g. with gcc -O3 opimizing compiler
--------------------------------------------------------
*/

/*
Fully seeding the Q[1220] array requires 64660
seed bits, plus another 106 for the initial zx
and zy of the lag-2 SWB sequence.
As in the above listing, using just two 32-bit
seeds, x and y in main(), to initialize the Q
array, one bit at a time, by means of the
resulting Congruential+Xorshift sequence
may be enough for many applications.

Applications such as in Law or Gaming may
require enough seeds in the Q[1220] array to
guarantee that each one of a huge set of
possible outcomes can appear. For example,
choosing a jury venire of 80 from a
list of 2000 eligibles would require at least
ten 53-bit seeds; choosing 180 from 4000 would
require twenty 53-bit seeds.
To get certification, a casino machine that could
play forty simultaneous games of poker must be
able to produce forty successive straight-flushes,
with a resulting minimal seed set.

Users can choose their 32-bit x,y for the
above seeding process, or develop their own
for more exacting requirements when a mere
set of 64 seed bits may not be enough.

Properties:
1. Simple and very fast, some 70 million double-precision
uniform(0,1] random variables/second, in IEEE 754 standard
form: 1 sign bit, 11 exponent bits, 52 fraction bits
plus the implied 1 leading the fraction part, making a
total of 53 bits for each uniform floating-point variate.

2. Period some 10^19492 (vs. the 10^61 of an earlier version),
G. Marsaglia and W.W. Tsang, "The 64-bit universal RNG",(2004)
Statistics and Probability Letters}, v66, no. 2, 183-187,
or an earlier version provided for Matlab.)

3. Following the KISS, (Keep-It-Simple-Stupid) principle,
combines, via subtraction, a CSWB sequence
x[n]=x[n-1220]-x[n-1190]-borrow mod 2^53
based on the prime p=b^1220-b^1190+1, b=2^53,
with a SWB lag-2 sequence z[n]=w[n-2]-z[n-1] -c mod 2^53.
All modular arithmetic on numerators of rationals
with denominators 2.^53.

4. Easily implemented in various programming languages,
as only tests on magnitude and double-precision floating-point
subtraction or addition required.

5. Passes all Diehard tests,
http://i.cs.hku.hk/~diehard/
As Diehard is designed for 32-bit integer input, all of its
234 tests are applied to bits 1..32, then 2..33, then 3..34,..,
then 22..53 of the 53 bits in each dUNI().
(To form 32-bit integer j from bits i..(i+31):
t=dUNI()*(1<<(i-1)); i=t; j=(t-i)*4294967296.0; )
All twenty-two choices for the 32-bit segments performed
very well on the 234 p-values from Diehard tests, using
the default 32-bit seeds x and y. You might try your own,
with possibly more extensive seeding of the Q array.
*/

George Marsaglia

From: James Waldby on 14 Apr 2010 13:40

On Wed, 14 Apr 2010 03:41:05 -0700, geo (George Marsaglia) wrote:

[I snipped your comments and the code of dUNI double-precision
uniform random variates generator and simple test, except for
the following bits of main]

> int main() /*You must provide at least two 32-bit seeds*/
....
> unsigned long x=123456789, y=362436069; /*default seeds*/
....
> for(...) ...
> x=69069*x+123; y^=(y<<13);y^=(y>>17);y^=(y<<5);
....
> Output, after 10^9 random uniforms:
> 0.6203646342357479
> Should take about 14 seconds,
> e.g. with gcc -O3 opimizing compiler
....

1. On my 2-GHz machine, runs take about 19 seconds. When you
give a run time, perhaps give a reference to machine speed also.

2. With code compiled as is, the result I get is 0.9250927935120845
on an Athlon 64 X2 5200+ cpu. If I substitute 'int' in place of
'long' in the x, y declaration quoted above, I get the result you
gave, 0.6203646342357479. Perhaps everyone in comp.lang.c knows
of clean ways to fix that declaration, if any exist. A brute force
fix would be to AND each operand and result in the x,y calcs in the
for loop with a 32-bit mask.

3. Compiling with -Wall turned on gives warning that control
reaches end of non-void function (main). So put a 'return 0'
or similar at the end of main.

--
jiw

From: Dann Corbit on 14 Apr 2010 14:06

For reference and comparison, here is a "down and dirty" translation
into a C class.

I guess that some C++ guru will take a few minutes to shine the apple a
bit so that it can be genuinely useful.

/*
From: geo <gmarsaglia(a)gmail.com>
Newsgroups: sci.math,comp.lang.c
Subject: RNGs: A double KISS
Date: Wed, 14 Apr 2010 03:41:05 -0700 (PDT)
Xref: eternal-september.org sci.math:94533 comp.lang.c:57850

The KISS (Keep-It Simple-Stupid) principle
appears to be one of the better ways to get
fast and reliable random numbers in a computer,
by combining several simple methods.

Most RNGs produce random integers---usually 32-bit---
that must be converted to floating point numbers
for applications. I offer here a posting
describing my latest version of a KISS RNG that
produces double-precision uniform random variables
directly, without need for the usual integer-to-float
conversion. It is an extension of a method I provided
for MATLAB, with details described in
G.Marsaglia and W.W.Tsang, The 64-bit universal RNG,
Statistics & Probability Letters, V66, Issue 2,2004.

Rather than combining simple lagged Fibonacci and Weyl
sequences, as in those versions, this latest version combines
a lag-1220 CSWB (Complimentary-Subtract-With-Borrow) sequence,
x[n]=x[n-1220-x[n-1190]-borrow \mod b=2^53, period >10^19462
with a lag-2 SWB (Subtract-With-Borrow) sequence,
z[n]=z[n-1]-z[n-2]-borrow mod b=2^53. period>10^30.
Details on how to maintain exact integer arithmetic
with only float operations are in the article above.

This double-precision RNG, called dUNI, has an immense period,
around 10^19492 versus a previous 10^61, requires just a few
lines of code, is very fast, some 70 million per second, and
---a key feature---behaves very well on tests of randomness.
The extensive tests of The Diehard Battery, designed for 32-bit
random integers, are applied to bits 1 to 32, then 2 to 33,
then 3 to 34,...,then finally bits 21 to 53 for each part of
the string of 53 bits in each double-precision float dUNI().

A C version is listed below, but as only tests of magnitude
and floating point addition or subtraction are required,
implementations in other languages could be made available.
I invite such from interested readers.

If you cut, paste, compile and run the following C code,
it should take around 14 seconds to generate 1000 million
dUNI's, followed by the output 0.6203646342357479.
------------------------------------------------------------
*/
/*
dUNI(), a double-precision uniform RNG based
on the KISS (Keep-It-Simple-Stupid) principle,
combining, by subtraction mod 1, a CSWB
(Complimentary-Subtract-With-Borrow) integer sequence
x[n]=x[n-1220]-x[n-1190]-borrow mod b=2^53,
with a SWB (Subtract-With-Borrow) sequence
z[n]=z[n-2]-z[n-1]-borrow mod b=2^53.

Both implemented as numerators of double precision
rationals, t for CSWB, zy for SWB, each formed as
(integer mod 2^53)/2^53, leading to a returned KISS
value t-zw mod 1. All denominators floats of b=2^53.

The CSWB sequence is based on prime p=b^1220-b^1190+1,
for which the order of b mod p is (p-1)/72, so the CSWB
sequence has period >2^64653 or 10^19462.

The SWB-lag2 sequence z[n]=z[n-2]-z[n-1]-zc mod b=2^53,
is based on b^2-b-1 = 11*299419*24632443746239056514780519
with period 10*299418*(24632443746239056514780518/2) > 2^101.
cc is the CSWB and SWB borrow or carry value: cc=1.0/b;
*/

#include <cstdio>
#include <cstdlib>
#include <cassert>

class kiss2 {

private:
double Q[1220];
int indx;
double cc;
double c; /* current CSWB */
double zc; /* current SWB `borrow` */
double zx; /* SWB seed1 */
double zy; /* SWB seed2 */
size_t qlen;/* length of Q array */

public:
kiss2()
{
assert(sizeof (double) >= 8);
cc = 1.0 / 9007199254740992.0; // inverse of 2^53rd power
int i;
size_t qlen = indx = sizeof Q / sizeof Q[0];
for (i = 0; i < qlen; i++)
Q[i] = 0;
double c = 0.0, zc = 0.0, /* current CSWB and SWB `borrow` */
zx = 5212886298506819.0 / 9007199254740992.0, /* SWB seed1 */
zy = 2020898595989513.0 / 9007199254740992.0; /* SWB seed2 */
int j;
double s, t; /* Choose 32 bits for x, 32 for y */
unsigned long x = 123456789, y = 362436069; /* default seeds * /
/* Next, seed each Q[i], one bit at a time, */
for (i = 0; i < qlen; i++)
{ /* using 9th bit from Cong+Xorshift */
s = 0.0;
t = 1.0;
for (j = 0; j < 52; j++)
{
t = 0.5 * t; /* make t=.5/2^j */
x = 69069 * x + 123;
y ^= (y << 13);
y ^= (y >> 17);
y ^= (y << 5);
if (((x + y) >> 23) & 1)
s = s + t; /* change bit of s, maybe */
} /* end j loop */
Q[i] = s;
} /* end i seed loop, Now generate 10^9 dUNI's: */
}

double dUNI()
{ /* Takes 14 nanosecs, Intel Q6600,2.40GHz */
int i, j;
double t; /* t: first temp, then next CSWB value */
/* First get zy as next lag-2 SWB */
t = zx - zy - zc;
zx = zy;
if (t < 0)
{
zy = t + 1.0;
zc = cc;
}
else
{
zy = t;
zc = 0.0;
}

/* Then get t as the next lag-1220 CSWB value */
if (indx < 1220)
t = Q[indx++];
else
{ /* refill Q[n] via Q[n-1220]-Q[n-1190]-c, */
for (i = 0; i < 1220; i++)
{
j = (i < 30) ? i + 1190 : i - 30;
t = Q[j] - Q[i] + c; /* Get next CSWB element */
if (t > 0)
{
t = t - cc;
c = cc;
}
else
{
t = t - cc + 1.0;
c = 0.0;
}
Q[i] = t;
} /* end i loop */
indx = 1;
t = Q[0]; /* set indx, exit 'else' with t=Q[0] */
} /* end else segment; return t-zy mod 1 */
return ((t < zy) ? 1.0 + (t - zy) : t - zy);
} /* end dUNI() */
};

int main(void)
{ /* You must provide at least two 32-bit seeds */
kiss2 krng;
double t;
int i;
for (i = 0; i < 1000000000; i++)
t = krng.dUNI ();
printf ("%.16f\n", krng.dUNI ());
}

/*
Output, after 10^9 random uniforms:
0.6203646342357479
Should take about 14 seconds,
e.g. with gcc -O3 opimizing compiler
--------------------------------------------------------
*/

/*
Fully seeding the Q[1220] array requires 64660
seed bits, plus another 106 for the initial zx
and zy of the lag-2 SWB sequence.
As in the above listing, using just two 32-bit
seeds, x and y in main(), to initialize the Q
array, one bit at a time, by means of the
resulting Congruential+Xorshift sequence
may be enough for many applications.

Applications such as in Law or Gaming may
require enough seeds in the Q[1220] array to
guarantee that each one of a huge set of
possible outcomes can appear. For example,
choosing a jury venire of 80 from a
list of 2000 eligibles would require at least
ten 53-bit seeds; choosing 180 from 4000 would
require twenty 53-bit seeds.
To get certification, a casino machine that could
play forty simultaneous games of poker must be
able to produce forty successive straight-flushes,
with a resulting minimal seed set.

Users can choose their 32-bit x,y for the
above seeding process, or develop their own
for more exacting requirements when a mere
set of 64 seed bits may not be enough.

Properties:
1. Simple and very fast, some 70 million double-precision
uniform(0,1] random variables/second, in IEEE 754 standard
form: 1 sign bit, 11 exponent bits, 52 fraction bits
plus the implied 1 leading the fraction part, making a
total of 53 bits for each uniform floating-point variate.

2. Period some 10^19492 (vs. the 10^61 of an earlier version),
G. Marsaglia and W.W. Tsang, "The 64-bit universal RNG",(2004)
Statistics and Probability Letters}, v66, no. 2, 183-187,
or an earlier version provided for Matlab.)

3. Following the KISS, (Keep-It-Simple-Stupid) principle,
combines, via subtraction, a CSWB sequence
x[n]=x[n-1220]-x[n-1190]-borrow mod 2^53
based on the prime p=b^1220-b^1190+1, b=2^53,
with a SWB lag-2 sequence z[n]=w[n-2]-z[n-1] -c mod 2^53.
All modular arithmetic on numerators of rationals
with denominators 2.^53.

4. Easily implemented in various programming languages,
as only tests on magnitude and double-precision floating-point
subtraction or addition required.

5. Passes all Diehard tests,
http://i.cs.hku.hk/~diehard/
As Diehard is designed for 32-bit integer input, all of its
234 tests are applied to bits 1..32, then 2..33, then 3..34,..,
then 22..53 of the 53 bits in each dUNI().
(To form 32-bit integer j from bits i..(i+31):
t=dUNI()*(1<<(i-1)); i=t; j=(t-i)*4294967296.0; )
All twenty-two choices for the 32-bit segments performed
very well on the 234 p-values from Diehard tests, using
the default 32-bit seeds x and y. You might try your own,
with possibly more extensive seeding of the Q array.

-- George Marsaglia

*/

From: Jonathan Lee on 14 Apr 2010 14:53

On Apr 14, 2:06 pm, Dann Corbit <dcor...(a)connx.com> wrote:

Thanks for sharing.

Isn't this dangerous for y:

> for (j = 0; j < 52; j++)
> {
> t = 0.5 * t; /* make t=.5/2^j */
> x = 69069 * x + 123;
> y ^= (y << 13);
> y ^= (y >> 17);
> y ^= (y << 5);
> if (((x + y) >> 23) & 1)
> s = s + t; /* change bit of s, maybe */
> } /* end j loop */

Specifically (y >> 17) could mix in high bits if unsigned
long were 64 bit instead of (presumably) 32-bit. Just at
a quick glance, I would think the operations on y
would need to be masked to be portable.

--Jonathan

From: Jonathan Lee on 14 Apr 2010 15:07

On Apr 14, 2:06 pm, Dann Corbit <dcor...(a)connx.com> wrote:
> For reference and comparison, here is a "down and dirty" translation
> into a C class.
>
> I guess that some C++ guru will take a few minutes to shine the apple a
> bit so that it can be genuinely useful.

Another thought: you could probably remove that assertion by building
the double from LSb to MSb. You'd essentially be bitreversing the
number which probably won't affect the randomness (can't prove it
off hand). If double didn't have enough bits, the machine would simply
round the value as you went along. Not ideal under some RNG scenarios,
but the result should be the "same" as the actual number up to machine
precision.

--Jonathan

| Next | Last
Pages: 1 2 3
Prev: Square Roots and Sequences
Next: Cycles of 1D Dynamical System