KISS4691, a potentially top-ranked RNG. [Math]

Prev: An investment problem
Next: A pi and e equation challenge.

From: Ilmari Karonen on 31 Jul 2010 10:25

["Followup-To:" header set to sci.math.]
On 2010-07-30, Ron Shepard <ron-shepard(a)NOSPAM.comcast.net> wrote:
> In article <i2tvn7$ldf$1(a)smaug.linux.pwf.cam.ac.uk>, nmm1(a)cam.ac.uk
> wrote:
>> In article <ron-shepard-969C9C.22130629072010(a)news60.forteinc.com>,
>> Ron Shepard <ron-shepard(a)NOSPAM.comcast.net> wrote:
>> >
>> >The situation that needs to be avoided is to run the code once with one
>> >seed, and then run it again with another seed that results in an overlap
>> >of the sequences of values for the two runs. In some applications this
>> >is unimportant, but in other applications it would be bad for two
>> >supposedly independent runs to have a long sequence of identical values.
>>
>> No, no, a thousand times, NO! That is NOT enough, though FAR too many
>> Web pages, published papers and books claim that it is. Disjointness
>> isn't even a poor relation of randomness.
>
[snip]
>
>> Parallel random number generation is not easy, and 99% of the stuff
>> published on it is somewhere between grossly misleading and complete
>> nonsense.
>
> I think in the parallel case, one would want to be able to generate
> a seed to produce values that are guaranteed not to overlap with any
> other node. Maybe something like
>
> call RANDOM_NEW_SEED( old_seed, n, new_seed, my_node )
>
> would be a sufficient interface. new_seed(:) would depend on
> my_node in such a way that the generated sequence would not overlap
> with that produced by any other possible value of my_node (again,
> assuming the cycle length is long enough to satisfy that request).

Let me try to demonstrate, using a simplified (and admittedly somewhat
artificial) counterexample, why lack of overlap is not a sufficient
condition for independence:

Assume we have a "random oracle" R: Z -> [0,1) which takes as its
input an integer and returns a uniformly distributed random real
number between 0 and 1, such that the same input always produces the
same output, but that the outputs for different inputs are completely
independent.

Given such an oracle, we can construct a perfect PRNG P: Z -> [0,1)^N
which takes as its input an integer k, and returns the sequence <R(k),
R(k+1), R(k+2), ...>. Obviously, the sequence generated by P would be
indistinguishable from random (since it is indeed, by definition,
random) and non-repeating.

Now, what if we wanted several independent streams of random numbers?
Obviously, we can't just pass different seed values to P, since we
know that the streams would eventually overlap. We could solve the
problem by modifying P, e.g. to make it return the sequence
<R(f(k,0)), R(f(k,1)), R(f(k,2)), ...> instead, where f is an
injective function from Z^2 to Z. But what if we wanted to do it
_without_ modifying P?

One _bad_ solution would be to define a new generator Q: Z -> [0,1)^N
as Q(k)_i = frac(P(0)_i + ak), where a is some irrational number and
frac: R -> [0,1) returns the part of a real number "after the decimal
point" (i.e. frac(x) = x - floor(x)).

Clearly, the sequences returned by Q for different seed values would
never overlap, and, individually, they would each still be perfectly
random. Yet, just as clearly, all those sequences would be related by
a very trivial linear relation that would be blatantly obvious if you,
say, plotted two of them against each other.

The _right_ solution, as I suggested above, would've been to redesign
the underlying generator so that the different streams would be not
just non-overlapping but actually statistically independent. The same
general advice holds for practical PRNGs too, not just for my
idealized example.

You can't just take any arbitrary PRNG, designed for generating a
_single_ stream of random-seeming numbers, and expect the output to
still look random if you get to compare several distinct output
streams generated from related seeds. It might turn out that it does
look so, if the designer of the PRNG was careful or just lucky. But
designing a PRNG to satisfy such a requirement is a strictly harder
problem than simply designing it to generate a single random-looking
stream.

--
Ilmari Karonen
To reply by e-mail, please replace ".invalid" with ".net" in address.

From: Uno on 31 Jul 2010 17:54

Ilmari Karonen wrote:
> ["Followup-To:" header set to sci.math.]
> On 2010-07-30, Ron Shepard <ron-shepard(a)NOSPAM.comcast.net> wrote:
>> In article <i2tvn7$ldf$1(a)smaug.linux.pwf.cam.ac.uk>, nmm1(a)cam.ac.uk
>> wrote:

>>> Parallel random number generation is not easy, and 99% of the stuff
>>> published on it is somewhere between grossly misleading and complete
>>> nonsense.
>> I think in the parallel case, one would want to be able to generate
>> a seed to produce values that are guaranteed not to overlap with any
>> other node. Maybe something like
>>
>> call RANDOM_NEW_SEED( old_seed, n, new_seed, my_node )
>>
>> would be a sufficient interface. new_seed(:) would depend on
>> my_node in such a way that the generated sequence would not overlap
>> with that produced by any other possible value of my_node (again,
>> assuming the cycle length is long enough to satisfy that request).
>
> Let me try to demonstrate, using a simplified (and admittedly somewhat
> artificial) counterexample, why lack of overlap is not a sufficient
> condition for independence:
>
> Assume we have a "random oracle" R: Z -> [0,1) which takes as its
> input an integer and returns a uniformly distributed random real
> number between 0 and 1, such that the same input always produces the
> same output, but that the outputs for different inputs are completely
> independent.
>
> Given such an oracle, we can construct a perfect PRNG P: Z -> [0,1)^N
> which takes as its input an integer k, and returns the sequence <R(k),
> R(k+1), R(k+2), ...>. Obviously, the sequence generated by P would be
> indistinguishable from random (since it is indeed, by definition,
> random) and non-repeating.
>
> Now, what if we wanted several independent streams of random numbers?
> Obviously, we can't just pass different seed values to P, since we
> know that the streams would eventually overlap. We could solve the
> problem by modifying P, e.g. to make it return the sequence
> <R(f(k,0)), R(f(k,1)), R(f(k,2)), ...> instead, where f is an
> injective function from Z^2 to Z. But what if we wanted to do it
> _without_ modifying P?
>
> One _bad_ solution would be to define a new generator Q: Z -> [0,1)^N
> as Q(k)_i = frac(P(0)_i + ak), where a is some irrational number and
> frac: R -> [0,1) returns the part of a real number "after the decimal
> point" (i.e. frac(x) = x - floor(x)).
>
> Clearly, the sequences returned by Q for different seed values would
> never overlap, and, individually, they would each still be perfectly
> random. Yet, just as clearly, all those sequences would be related by
> a very trivial linear relation that would be blatantly obvious if you,
> say, plotted two of them against each other.
>
> The _right_ solution, as I suggested above, would've been to redesign
> the underlying generator so that the different streams would be not
> just non-overlapping but actually statistically independent. The same
> general advice holds for practical PRNGs too, not just for my
> idealized example.
>
> You can't just take any arbitrary PRNG, designed for generating a
> _single_ stream of random-seeming numbers, and expect the output to
> still look random if you get to compare several distinct output
> streams generated from related seeds. It might turn out that it does
> look so, if the designer of the PRNG was careful or just lucky. But
> designing a PRNG to satisfy such a requirement is a strictly harder
> problem than simply designing it to generate a single random-looking
> stream.
>

Now that's an interesting read. Ron is a person who makes "mistakes" in
contemporary scientific programming, but he always moves the scrum
forward in a useful way.

I notice that follow-ups are set to sci.math, which I've never really
read much. When I have needed mathematical help on usenet, I've turned
to the germans in de.sci.mathematik, who have a very well-stylized notation.

I'm used to seeing the notation of functions from the point of abstract
algebra (in particular the Gallian text, which follows me everywhere).
But there's been a lot of water under the bridge (and one concussion)
since I had the leisure of reading college textbooks.

So my question for you is going to be more of a request, namely, that
you say a few more words about the mappings, what sets are involved, and
what things like Z^2 mean exactly.

To reveal my naivete I would think a perfect prng would look like:

P: Z^N -> [0,1)^N

By the way, nice job to get the interval [0,1) correct. That's a
fortranism.

Cheers,
--
Uno

From: sturlamolden on 31 Jul 2010 20:31

On 31 Jul, 16:25, Ilmari Karonen <usen...(a)vyznev.invalid> wrote:

> The _right_ solution, as I suggested above, would've been to redesign
> the underlying generator so that the different streams would be not
> just non-overlapping but actually statistically independent.

Yes. For the Mersenne Twister it has been suggested to use different
characterisic polynomials that are "relatively prime" to each other.
Another suggestion is to derive a fast "jump ahead" algorithm allowing
parallel work share with the same polynomial. Just picking different
seeds and praying they will not overlap/interact is risky.

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dgene.pdf
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/ARTICLES/jump-seta-lfsr.pdf

The other option is to observe that random numbers can usually be
produced much faster than they are consumed. That is because the
simple integer manipulation in the PRNG is much less expensive than
the complicated floating point maths in numerical algorithms. If we
can only consume a fraction of the 138 million random numbers per
second from KISS4691, we don't need a parallel PRNG. We can let one
processor fill N (spinlock synchronized) queues (fifos) with
pseudorandom integers from one KISS4691 sequence. The queues will
serve as sources of random bits for a group of N relatively/mutually
independent PRNGs. Amortizing the synchronization overhead comes down
to buffering inside the prng frontend harvesting the queues. E.g. with
an internal buffer of 256 integers, one only has to aquire the
spinlock and harvest a queue on every 256 call to the prng. The best
buffer size is dependent on the hardware (e.g. hierarchical memory).

Here we could note that "Keep it simple stupid" is fast compared to
Mersenne Twister, which makes it more suitable for this kind of
parallel algorithms.

From: Ron Shepard on 1 Aug 2010 21:19

In article
<e5f36412-ddf2-4ed5-9014-9d23e2c0da88(a)u4g2000prn.googlegroups.com>,
orz <cdhorz(a)gmail.com> wrote:

> If an RNG has a large number of possible states (say, 2^192) then it's
> unlikely for any two runs to EVER reach identical states by any means
> other than starting from the same seed. This RNG has a period vastly
> in excess of "large", so for practical purposes it just won't
> happen.

If you think about it for a second, you can see that this is not true
for a PRNG with the standard fortran interface. This interface allows
the seed values to be both input and output. The purpose (presumably,
although this is not required by the standard) for this is to allow the
seed to be extracted at the end of one run, and then used to initiate
the seed for the next run. This eliminates the possibility of the two
runs having overlapping sequences (provided the PRNG cycle is long
enough, of course).

But, with the ability to read the internal state of the PNRG, this would
allow the user to generate the exact same sequence twice (useful for
debugging, for example). Or the user could generate two sequences that
have arbitrary numbers of overlapping values (the sequences offset by 1,
offset by 2, offset by 100, or whatever). I can't think of any
practical reason for doing this. However, since it can be done with
100% certainty, it is not "unlikely" for it to occur, assuming the user
wants it to happen.

In a previous post, I suggested a new (intrinsic fortran) subroutine of
the form

call RANDOM_NEW_SEED( old_seed, n, new_seed )

One way to implement this subroutine would be for new_seed(:) to be the
internal state corresponding to the (n+1)-th element of the sequence
that results (or would result) from initialization with old_seed(:). Of
course, for this to be practical, especially for large values of n, this
would require the PRNG algorithm to allow the internal state for
arbitrary elements within a sequence to be determined in this way. That
would place practical restrictions on which PRNG algorithms can be used
with this interface, especially if additional features such as
independence of the separate sequences are also required.

$.02 -Ron Shepard

From: sturlamolden on 1 Aug 2010 21:29

I did a speed comparison on my laptop of KISS4691 against Mersenne
Twister 19937. KISS4691 produced about 110 million random numbers per
second, whereas MT19937 produced 118 million. If the compiler was
allowed to inline KISS4691, the performance increased to 148 millon
per second. The function call overhead is imporant, and without taking
it into consideration, MT19937 will actually be faster than KISS4691.

Also note that a SIMD oriented version of MT19937 is twice as fast as
the one used in my tests. There is also a version of Mersenne Twister
suitable for parallel processing and GPUs. So a parallel or SIMD
version of Mersenne Twister is likely to be a faster PRNG than
KISS4691.

Then for the question of numerical quality: Is KISS4691 better than
MT19937? I don't feel qualified to answer this. Marsaglia is the
world's foremost authority on random number generators, so I trust his
advice, but MT19937 is claimed to give impeckable quality except for
crypto.

For those who need details:

The speed test was basically running the PRNGs for five seconds or
more, querying Windows' performance counter on every ten million call,
and finally subtracting an estimate of the timing overhead. The C
compiler was Microsoft C/C++ version 15.00.30729.01 (the only
optimization flag I used was /O2, i.e. maximize for speed). The
processor is AMD Phenom N930 @ 2.0 Gz.

Sturla Molden

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Prev: An investment problem
Next: A pi and e equation challenge.