From: rossum on
On Fri, 27 Nov 2009 13:52:21 -0800 (PST), geo <gmarsaglia(a)gmail.com>
wrote:

> On Nov 3 I posted
>
> RNGs: A Super KISS sci.math, comp.lang.c, sci.crypt
>
>a KISS (Keep-It-Simple-Stupid) RNG combining,
>by addition mod 2^32, three simple RNGs:
> CMWC(Complementary-Multiply-With-Carry)+CNG(Congruential)
> +XS(Xorshift)
>with resulting period greater than 10^402575.
>
>The extreme period comes from finding a prime p=a*b^r+1 for
>which the order of b has magnitude near p-1, then use
>the CMWC method, the mathematics of which I outline here:
>
>Let Z be the set of all "vectors" of the form
> [x_1,...,x_r;c] with 0<=x_i<b and 0<=c<a.
>Then Z has ab^r elements, and if the function f() on Z is
> f([x_1,x2,...,x_r;c])=
> [x_2,...,x_r,b-1-(t mod b);trunc((t/b)], t=a*x_1+c
>then f() has an inverse on Z: for each z in Z there is exactly
>one w in Z for which f(w)=z:
> f^{-1}([x_1,x2,...,x_r;c])=
> [trunc((v/a),x_1,...,x_{r-1}; v mod a], v=cb+(b-1)-x_r
>
>Thus a directed graph based on z->f(z) will consist only
>of disjoint loops of size s=order(b,p) and there will be
>L=ab^r/s such loops.
>
>A random uniform choice of z from Z is equally likely to
>fall in any one of the L loops, and then each "vector" in
>the sequence obtained by iterating f:
> f(z), f(f(z)), f(f(f(z))),...
>will have a uniform distribution over that loop, and the sequence
>determined by taking the r'th element from each "vector",
>(the output of the CMWC RNG) will be periodic with period
>the order of b for the prime p=ab^r+1, and that sequence
>will produce, in reverse order, the base-b expansion of a
>rational k/p for some k determined by choice of the seed z.
>
>For that Nov 3 post, I had found that the order of b=2^32
>for the prime p=7010176*b^41790+1 is 54767*2^1337279, about
>10^402566, thus providing an easily-implemented
>KISS=CMWC+CNG+XS RNG with immense period and
>excellent performance on tests of randomness.
>
>That easy implementation required carrying out the essential
>parts of the CMWC function f(): form t=7010176*x+c in 64 bits,
>extract the top 32 bits for the new c, the bottom 32 for
>the new x---easy to do in C, not easy in Fortran.
>And if we want 64-bit random numbers, with B=2^64, our prime
>becomes 7010176*B^20985+1, for which the period of B is
>54767*2^1337278, still immense, but in C, with 64-bit x,c
>there seems no easy way to form t=7010176*x+c in 128 bits,
>then extract the top and bottom 64 bit halves.
>
>So base b=2^32 works for C but not Fortran,
>and base B=2^64 works for neither C nor Fortran.
>
>I offer here is a prime that provides CMWC RNGs for both
>32- and 64-bits, and for both C and Fortran, and with
>equally massive periods, again greater than 2^(1.3million):
>
> p=640*b^41265+1 = 2748779069440*B^20632+1 = 5*2^1320487+1.
>
>That prime came from the many who have dedicated their
>efforts and computer time to prime searches. After some
>three weeks of dedicated computer time using pfgw with
>scrypt, I found the orders of b and B:
> 5*2^1320481 for b=2^32, 5*2^1320480 for B=2^64.
>
>It is the choice of the "a" that makes it feasible to get
>the top and bottom valves of t=a*x+c, yet stay within the
>integer sizes the C or Fortran compilers are set for.
>In the above prime: a=640=2^9+2^7 for b=2^32 and
> a=2748779069440=2^41+2^39 for B=2^64.
>Thus, for example with b=2^32 and using only 32-bit C code,
>with a supposed 128-bit t=(2^9+2^7)*x+c, the top and bottom
>32-bits of t may be obtained by setting, say,
> h=(c&1); z=(x<<9)>>1 + (x<<7)>>1 + c>>1;
>then the top half of that t would be
> c=(x>>23)+(x>>25)+(z>>31);
>and the bottom half, before being complemented, would be
> x=(z<<1)+h;
>
>When B=2^64 we need only change to
> h=(c&1); z=(x<<41)>>1 + (x<<39)>>1 + c>>1;
> c=(x>>23)+(x>>25)+(z>>63);
>
>These C operations all have Fortran equivalents, and will
>produce the required bit patterns, whether integers are
>considered signed or unsigned. (In C, one must make sure
>that the >> operation performs a logical right shift,
>perhaps best done via "unsigned" declarations.)
>
>The CMWC z "vector" elements [x_1,x_2,...,x_r] are kept in
>an array, Q[] in C, Q() in Fortran, with a separate current
>"carry". This is all spelled out in the following examples:
>code for 32- and 64-bit SuperKiss RNGs for C and Fortran.
>
>Note that in these sample listings, the Q array is seeded
>by CNG+XS, based on the seed values specified in the
>initial declarations. For many simulation studies, the
>73 bits needed to seed the initial xcng, xs and carry<a
>for the 32-bit version, or 169 bits needed for the 64-bit
>version, may be adequate.
>But more demanding applications may require a significant
>portion of the >1.3 million seed bits that Q requires.
>See text and comments from the Nov 3 posting.
>
>I am indebted to an anonymous mecej4 for providing the basic
>form and KIND declarations of the Fortran versions.
>
>Please let me and other readers know if the results are not
>as specified when run with your compilers, or if you can
>provide equivalent versions in other programming languages.
>
>George Marsaglia
>
>--------------------------------------------------------
>Here is SUPRKISS64.c, the immense-period 64-bit RNG. I
>invite you to cut, paste, compile and run to see if you
>get the result I do. It should take around 20 seconds.
>--------------------------------------------------------
>/* SUPRKISS64.c, period 5*2^1320480*(2^64-1) */
>#include <stdio.h>
> static unsigned long long Q[20632],carry=36243678541LL,
> xcng=12367890123456LL,xs=521288629546311LL,indx=20632;
>
> #define CNG ( xcng=6906969069LL*xcng+123 )
> #define XS ( xs^=xs<<13,xs^=xs>>17,xs^=xs<<43 )
> #define SUPR ( indx<20632 ? Q[indx++] : refill() )
> #define KISS SUPR+CNG+XS
>
> unsigned long long refill( )
> {int i; unsigned long long z,h;
> for(i=0;i<20632;i++){ h=(carry&1);
> z=((Q[i]<<41)>>1)+((Q[i]<<39)>>1)+(carry>>1);
> carry=(Q[i]>>23)+(Q[i]>>25)+(z>>63);
> Q[i]=~((z<<1)+h); }
> indx=1; return (Q[0]);
> }
>
> int main()
> {int i; unsigned long long x;
> for(i=0;i<20632;i++) Q[i]=CNG+XS;
> for(i=0;i<1000000000;i++) x=KISS;
> printf("Does x=4013566000157423768\n x=%LLd.\n",x);
>}
>---------------------------------------------------------
>
>Here is SUPRKISS32.c, the immense-period 32-bit RNG. I
>invite you to cut, paste, compile and run to see if you
>get the result I do. It should take around 10 seconds.
>---------------------------------------------------------
>/*suprkiss64.c
> b=2^64; x[n]=(b-1)-[(2^41+2^39)*x[n-20632]+carry mod b]
> period 5*2^1320480>10^397505
>This version of SUPRKISS doesn't use t=a*x+c in 128 bits,
>but uses only 64-bit stuff, takes 20 nanos versus 7.5 for
>the 32-bit unsigned long long t=a*x+c version.
>*/
>
>/* SUPRKISS64.c, period 5*2^1320480*(2^64-1) */
>#include <stdio.h>
> static unsigned long long Q[20632],carry=36243678541LL,
> xcng=12367890123456LL,xs=521288629546311LL,indx=20632;
>
> #define CNG ( xcng=6906969069LL*xcng+123 )
> #define XS ( xs^=xs<<13,xs^=xs>>17,xs^=xs<<43 )
> #define SUPR ( indx<20632 ? Q[indx++] : refill() )
> #define KISS SUPR+CNG+XS
>
> unsigned long long refill( )
> {int i; unsigned long long z,h;
> for(i=0;i<20632;i++){ h=(carry&1);
> z=((Q[i]<<41)>>1)+((Q[i]<<39)>>1)+(carry>>1);
> carry=(Q[i]>>23)+(Q[i]>>25)+(z>>63);
> Q[i]=~((z<<1)+h); }
> indx=1; return (Q[0]);
> }
>
> int main()
> {int i; unsigned long long x;
> for(i=0;i<20632;i++) Q[i]=CNG+XS;
> for(i=0;i<1000000000;i++) x=KISS;
> printf("Does x=4013566000157423768\n x=%LLd.\n",x);
>}
>
>-----------------------------------------------------------
>
>And here are equivalent Fortran versions, which, absent
>C's inline features, seem to need ~10% more run time.

[Fortran versions removed, along with comp.lang.fortran
and comp.lang.c]

>
>---------------------------------------------------------------

Two simple Java versions. Neither extends System.Random, though it
would not be too difficult to do so if required.

rossum


/*** 32 bit SuperKiss32 ***/

/**
* <p>An implementation of George Marsaglia's long period SuperKISS32
* PRNG. The period of this PRNG is 5*2^1320481*(2^32-1)</p>
*
* <p>This version is written for clarity rather than speed.</p>
*
* @author rossum
* @version 1.0; November 2009
*/
public class SuperKiss32 {

final int Q_SIZE = 41265;

int mQ[] = new int[Q_SIZE];

private int mCarry = 362;
private int mXcng = 1236789;
private int mXs = 521288629;
private int mIndex = Q_SIZE + 1;

public SuperKiss32() {
// Fill mQ[] with Congruential + XorShift
for (int i = 0; i < Q_SIZE; ++i) {
congruential();
xorShift();
mQ[i] = (mXcng + mXs);
}
}

public int nextInt() {
return stepGenerator();
}

private int refillQ() {
int z, h;
for (int i = 0; i < Q_SIZE; ++i) {
h = mCarry & 0x01;

z = ((mQ[i] << 9) >>> 1) +
((mQ[i] << 7) >>> 1) +
(mCarry >>> 1);

mCarry = (mQ[i] >>> 23) + (mQ[i] >>> 25) + (z >>> 31);

mQ[i] = ~((z << 1) + h);
}
mIndex = 1;
return mQ[0];
}

private void congruential() {
mXcng *= 69069;
mXcng += 123;
}

private void xorShift() {
mXs ^= (mXs << 13);
mXs ^= (mXs >>> 17);
mXs ^= (mXs << 5);
}

private int supr() {
return (mIndex < Q_SIZE) ? mQ[mIndex++] : refillQ();
}

private int stepGenerator() {
congruential();
xorShift();
return (supr() + mXcng + mXs);
}

public static boolean testKiss32(boolean verbose) {
int x = 0;
final int expected = 1809478889;
boolean retVal;

SuperKiss32 sk32 = new SuperKiss32();
if (verbose) {
System.out.println("Testing SuperKiss32...");
System.out.println();
}
int reps = 1000000000;
for (int i = 0; i < reps; ++i) {
x = sk32.stepGenerator();
}

if (x == expected) {
if (verbose) {
System.out.println(
"One billion uses of SuperKiss32 OK.");
}
retVal = true;
} else {
if (verbose) {
System.out.println("SuperKiss32 failed: x == " +
x + ", not " + expected);
}
retVal = false;
}
return retVal;
}

} // end class SuperKiss32

/**************************/


/*** 64 bit SuperKiss64 ***/

/**
* <p>An implementation of George Marsaglia's long period SuperKISS64
* PRNG. The period of this PRNG is 5*2^1320480*(2^64-1)</p>
*
* <p>This version is written for clarity rather than speed.</p>
*
* @author rossum
* @version 1.0; November 2009
*/
public class SuperKiss64 {
final int Q_SIZE = 20632;

long mQ[] = new long[Q_SIZE];

private long mCarry = 36243678541L;
private long mXcng = 12367890123456L;
private long mXs = 521288629546311L;
private int mIndex = Q_SIZE + 1;

public SuperKiss64() {
// Fill mQ[] with Congruential + XorShift
for (int i = 0; i < Q_SIZE; ++i) {
congruential();
xorShift();
mQ[i] = (mXcng + mXs);
}
}

public long nextLong() {
return stepGenerator();
}

private long refillQ() {
long z, h;
for (int i = 0; i < Q_SIZE; ++i) {
h = mCarry & 0x01L;

z = ((mQ[i] << 41) >>> 1) +
((mQ[i] << 39) >>> 1) +
(mCarry >>> 1);

mCarry = (mQ[i] >>> 23) + (mQ[i] >>> 25) + (z >>> 63);

mQ[i] = ~((z << 1) + h);
}
mIndex = 1;
return mQ[0];
}

private void congruential() {
mXcng *= 6906969069L;
mXcng += 123L;
}

private void xorShift() {
mXs ^= (mXs << 13);
mXs ^= (mXs >>> 17);
mXs ^= (mXs << 43);
}

private long supr() {
return (mIndex < Q_SIZE) ? mQ[mIndex++] : refillQ();
}

private long stepGenerator() {
congruential();
xorShift();
return (supr() + mXcng + mXs);
}

public static boolean testKiss64(boolean verbose) {
long x = 0;
final long expected = 4013566000157423768L;
boolean retVal;

SuperKiss64 sk64 = new SuperKiss64();
if (verbose) {
System.out.println("Testing SuperKiss64...");
System.out.println();
}
int reps = 1000000000;
for (int i = 0; i < reps; ++i) {
x = sk64.stepGenerator();
}

if (x == expected) {
if (verbose) {
System.out.println(
"One billion uses of SuperKiss64 OK.");
}
retVal = true;
} else {
if (verbose) {
System.out.println("SuperKiss64 failed: x == " +
x + ", not " + expected);
}
retVal = false;
}
return retVal;
}

} // end class SuperKiss64

/**************************/