From: Richard Heathfield on 22 Mar 2010 02:50 mike wrote: > In article <He-dnXLmZ6o_VzjWnZ2dnUVZ8l2dnZ2d(a)bt.com>, > rjh(a)see.sig.invalid says... <snip> >> (Sorry for the late reply - I've been kinda busy.) >> > And sorry if I appeared to overreact - the coffee machine was empty. I feel your pain; so, as soon as I've posted this article, I'm going to fax some coffee to you, to tide you over until the vendor's next delivery. -- Richard Heathfield <http://www.cpax.org.uk> Email: -http://www. +rjh@ "Usenet is a strange place" - dmr 29 July 1999 Sig line vacant - apply within
From: mike on 22 Mar 2010 18:56
In article <ndydnWHHL76kjjrWnZ2dnUVZ8kti4p2d(a)bt.com>, rjh(a)see.sig.invalid says... > mike wrote: > > In article <He-dnXLmZ6o_VzjWnZ2dnUVZ8l2dnZ2d(a)bt.com>, > > rjh(a)see.sig.invalid says... > <snip> > > >> (Sorry for the late reply - I've been kinda busy.) > >> > > And sorry if I appeared to overreact - the coffee machine was empty. > > I feel your pain; so, as soon as I've posted this article, I'm going to > fax some coffee to you, to tide you over until the vendor's next delivery. > I have invested in a portable filter cup and a supply of fresh grounds now - so I can avoid recurence of the above. But do look forwards to your fax. Further to the original problem though, I believe (but have not taken the time to prove) that in practice, if we were looking for an uncompressed ascii subtext in a large random (monkey generated) string, then the small variations in probability from your approximation would be due to repetitions of the first few characters of the subtest - and only the first few repetitions would make any significant difference. So I believe it might be practical to take just the first few characters (maybe in the range of 100-1000) of the subtext, determine the influence of pattern on the probability of finding that text and then extrapolate with your approximation for the rest of the subtext. So, for example if we chose the first 1000 out of a 10000 character substring, then it would be a simple process (and a relatively modest amunt of processing) to repeatedly square the 1000x1000 array a few dozen times to determine the probability of finding those 1000 characters within a 10^14-10^15 character monkey string (for values of 'few' approximately equal to 4). Then we just scale that probability by your approximation of finding the remaining 9000 characters. Some care might be needed to ensure that the array elements were floating point numbers with sufficient resolution to avoid round-off errors during the process. Mike |