Prev: solutions book
Next: real kind declaration
From: Ilmari Karonen on 31 Jul 2010 10:25 ["Followup-To:" header set to sci.math.] On 2010-07-30, Ron Shepard <ron-shepard(a)NOSPAM.comcast.net> wrote: > In article <i2tvn7$ldf$1(a)smaug.linux.pwf.cam.ac.uk>, nmm1(a)cam.ac.uk > wrote: >> In article <ron-shepard-969C9C.22130629072010(a)news60.forteinc.com>, >> Ron Shepard <ron-shepard(a)NOSPAM.comcast.net> wrote: >> > >> >The situation that needs to be avoided is to run the code once with one >> >seed, and then run it again with another seed that results in an overlap >> >of the sequences of values for the two runs. In some applications this >> >is unimportant, but in other applications it would be bad for two >> >supposedly independent runs to have a long sequence of identical values. >> >> No, no, a thousand times, NO! That is NOT enough, though FAR too many >> Web pages, published papers and books claim that it is. Disjointness >> isn't even a poor relation of randomness. > [snip] > >> Parallel random number generation is not easy, and 99% of the stuff >> published on it is somewhere between grossly misleading and complete >> nonsense. > > I think in the parallel case, one would want to be able to generate > a seed to produce values that are guaranteed not to overlap with any > other node. Maybe something like > > call RANDOM_NEW_SEED( old_seed, n, new_seed, my_node ) > > would be a sufficient interface. new_seed(:) would depend on > my_node in such a way that the generated sequence would not overlap > with that produced by any other possible value of my_node (again, > assuming the cycle length is long enough to satisfy that request). Let me try to demonstrate, using a simplified (and admittedly somewhat artificial) counterexample, why lack of overlap is not a sufficient condition for independence: Assume we have a "random oracle" R: Z -> [0,1) which takes as its input an integer and returns a uniformly distributed random real number between 0 and 1, such that the same input always produces the same output, but that the outputs for different inputs are completely independent. Given such an oracle, we can construct a perfect PRNG P: Z -> [0,1)^N which takes as its input an integer k, and returns the sequence <R(k), R(k+1), R(k+2), ...>. Obviously, the sequence generated by P would be indistinguishable from random (since it is indeed, by definition, random) and non-repeating. Now, what if we wanted several independent streams of random numbers? Obviously, we can't just pass different seed values to P, since we know that the streams would eventually overlap. We could solve the problem by modifying P, e.g. to make it return the sequence <R(f(k,0)), R(f(k,1)), R(f(k,2)), ...> instead, where f is an injective function from Z^2 to Z. But what if we wanted to do it _without_ modifying P? One _bad_ solution would be to define a new generator Q: Z -> [0,1)^N as Q(k)_i = frac(P(0)_i + ak), where a is some irrational number and frac: R -> [0,1) returns the part of a real number "after the decimal point" (i.e. frac(x) = x - floor(x)). Clearly, the sequences returned by Q for different seed values would never overlap, and, individually, they would each still be perfectly random. Yet, just as clearly, all those sequences would be related by a very trivial linear relation that would be blatantly obvious if you, say, plotted two of them against each other. The _right_ solution, as I suggested above, would've been to redesign the underlying generator so that the different streams would be not just non-overlapping but actually statistically independent. The same general advice holds for practical PRNGs too, not just for my idealized example. You can't just take any arbitrary PRNG, designed for generating a _single_ stream of random-seeming numbers, and expect the output to still look random if you get to compare several distinct output streams generated from related seeds. It might turn out that it does look so, if the designer of the PRNG was careful or just lucky. But designing a PRNG to satisfy such a requirement is a strictly harder problem than simply designing it to generate a single random-looking stream. -- Ilmari Karonen To reply by e-mail, please replace ".invalid" with ".net" in address.
From: Ron Shepard on 1 Aug 2010 21:19 In article <e5f36412-ddf2-4ed5-9014-9d23e2c0da88(a)u4g2000prn.googlegroups.com>, orz <cdhorz(a)gmail.com> wrote: > If an RNG has a large number of possible states (say, 2^192) then it's > unlikely for any two runs to EVER reach identical states by any means > other than starting from the same seed. This RNG has a period vastly > in excess of "large", so for practical purposes it just won't > happen. If you think about it for a second, you can see that this is not true for a PRNG with the standard fortran interface. This interface allows the seed values to be both input and output. The purpose (presumably, although this is not required by the standard) for this is to allow the seed to be extracted at the end of one run, and then used to initiate the seed for the next run. This eliminates the possibility of the two runs having overlapping sequences (provided the PRNG cycle is long enough, of course). But, with the ability to read the internal state of the PNRG, this would allow the user to generate the exact same sequence twice (useful for debugging, for example). Or the user could generate two sequences that have arbitrary numbers of overlapping values (the sequences offset by 1, offset by 2, offset by 100, or whatever). I can't think of any practical reason for doing this. However, since it can be done with 100% certainty, it is not "unlikely" for it to occur, assuming the user wants it to happen. In a previous post, I suggested a new (intrinsic fortran) subroutine of the form call RANDOM_NEW_SEED( old_seed, n, new_seed ) One way to implement this subroutine would be for new_seed(:) to be the internal state corresponding to the (n+1)-th element of the sequence that results (or would result) from initialization with old_seed(:). Of course, for this to be practical, especially for large values of n, this would require the PRNG algorithm to allow the internal state for arbitrary elements within a sequence to be determined in this way. That would place practical restrictions on which PRNG algorithms can be used with this interface, especially if additional features such as independence of the separate sequences are also required. $.02 -Ron Shepard
From: sturlamolden on 1 Aug 2010 21:29 I did a speed comparison on my laptop of KISS4691 against Mersenne Twister 19937. KISS4691 produced about 110 million random numbers per second, whereas MT19937 produced 118 million. If the compiler was allowed to inline KISS4691, the performance increased to 148 millon per second. The function call overhead is imporant, and without taking it into consideration, MT19937 will actually be faster than KISS4691. Also note that a SIMD oriented version of MT19937 is twice as fast as the one used in my tests. There is also a version of Mersenne Twister suitable for parallel processing and GPUs. So a parallel or SIMD version of Mersenne Twister is likely to be a faster PRNG than KISS4691. Then for the question of numerical quality: Is KISS4691 better than MT19937? I don't feel qualified to answer this. Marsaglia is the world's foremost authority on random number generators, so I trust his advice, but MT19937 is claimed to give impeckable quality except for crypto. For those who need details: The speed test was basically running the PRNGs for five seconds or more, querying Windows' performance counter on every ten million call, and finally subtracting an estimate of the timing overhead. The C compiler was Microsoft C/C++ version 15.00.30729.01 (the only optimization flag I used was /O2, i.e. maximize for speed). The processor is AMD Phenom N930 @ 2.0 Gz. Sturla Molden
From: glen herrmannsfeldt on 1 Aug 2010 22:29 In comp.lang.fortran Ron Shepard <ron-shepard(a)nospam.comcast.net> wrote: (snip) > In a previous post, I suggested a new (intrinsic fortran) subroutine of > the form > call RANDOM_NEW_SEED( old_seed, n, new_seed ) > One way to implement this subroutine would be for new_seed(:) to be the > internal state corresponding to the (n+1)-th element of the sequence > that results (or would result) from initialization with old_seed(:). Of > course, for this to be practical, especially for large values of n, this > would require the PRNG algorithm to allow the internal state for > arbitrary elements within a sequence to be determined in this way. That > would place practical restrictions on which PRNG algorithms can be used > with this interface, especially if additional features such as > independence of the separate sequences are also required. The standard mostly does not specify what is, or is not, practical. DO I=1,INT(1000000,SELECTED_INT_KIND(20))**3 ENDDO For many generators, one should not specify too large a value for the above RANDOM_NEW_SEED routine. -- glen
From: Noob on 2 Aug 2010 07:06
sturlamolden wrote: > [...] MT19937 is claimed to give impeckable quality [...] It cannot be eaten in small bites? ;-) |