Prev: How is the statistic of autocorrelation test for randomness arrived at?
Next: Collecting true randomness from natural language texts
From: Mok-Kong Shen on 4 Apr 2010 17:29 David Sexton wrote: > Doesn't the statistic equal the number of standard deviations above > population mean? > > That makes sense because the example gives a threshold of 1.96 for a > 0.05 level of significance. These numbers (according to the text) > come from Table 5.1. Table 5.1 is "Selected Percentiles of the > Standard Normal Distribution." Sorry for my poor comprehension. Could you explain a bit more how (why) the A(d), which is the xor of the bits at distance d, goes in that "specific" form into the formaula for the stated test statistic there? M. K. Shen
From: Mok-Kong Shen on 9 Apr 2010 01:34
David Sexton wrote: > Mok-Kong Shen wrote: >> Sorry for my poor comprehension. Could you explain a bit more how >> (why) the A(d), which is the xor of the bits at distance d, goes in that >> "specific" form into the formaula for the stated test statistic there? > For that test I would use, and have used, a chi-squared statistic. > Nevertheless, I should be able to remember what this one-sided test > statistic is called. I don't. I'll keep thinking about it and > looking around; it bothers me that I can't remember. > > In the mean time.... The probablility that a bit in will be different > than another bit in the sequence at a given offset should be 0.5. In > the formula, (n - d) is the number of bits considered. There are no > bits at offset "d" with which to xor the other "d" bits in the > sequence, which is "n" bits long. So, (n - d) bits all have a 0.5 > probability of being 1. The "expected" number of 1s is (n - d) / 2. > A(d) - (n - d)/2 is the difference between the observed number of 1s > and the expected number of 1s. > > The whole statistic must be equal to the number of standard deviations > away from (n - d)/2 of A(d). The test depends on the fact that, for a > large number of trials, the binomial distribution approximates the > normal (Gaussian) distribution. Thank you a lot for explaining the statistic in question. As to the method you would use instead, could it be the one named the serial correlation test? cf. Knuth vol. 2, 1998, p.72. I surmise (I am not sure) that in the present case of bit sequence with 0/1 as bit values one would need to first change the 0 to -1 before computing the autocorrelation coefficients for the diverse lags. These coefficients, having values lying between -1 and +1, are then subjected to a chi-square test. Could I be right in this? Thanks, M. K. Shen |