From: Richard Heathfield on
mike wrote:
> In article <He-dnXLmZ6o_VzjWnZ2dnUVZ8l2dnZ2d(a)bt.com>,
> rjh(a)see.sig.invalid says...
<snip>

>> (Sorry for the late reply - I've been kinda busy.)
>>
> And sorry if I appeared to overreact - the coffee machine was empty.

I feel your pain; so, as soon as I've posted this article, I'm going to
fax some coffee to you, to tide you over until the vendor's next delivery.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within
From: mike on
In article <ndydnWHHL76kjjrWnZ2dnUVZ8kti4p2d(a)bt.com>,
rjh(a)see.sig.invalid says...
> mike wrote:
> > In article <He-dnXLmZ6o_VzjWnZ2dnUVZ8l2dnZ2d(a)bt.com>,
> > rjh(a)see.sig.invalid says...
> <snip>
>
> >> (Sorry for the late reply - I've been kinda busy.)
> >>
> > And sorry if I appeared to overreact - the coffee machine was empty.
>
> I feel your pain; so, as soon as I've posted this article, I'm going to
> fax some coffee to you, to tide you over until the vendor's next delivery.
>
I have invested in a portable filter cup and a supply of fresh grounds
now - so I can avoid recurence of the above. But do look forwards to
your fax.

Further to the original problem though, I believe (but have not taken
the time to prove) that in practice, if we were looking for an
uncompressed ascii subtext in a large random (monkey generated) string,
then the small variations in probability from your approximation would
be due to repetitions of the first few characters of the subtest - and
only the first few repetitions would make any significant difference. So
I believe it might be practical to take just the first few characters
(maybe in the range of 100-1000) of the subtext, determine the influence
of pattern on the probability of finding that text and then extrapolate
with your approximation for the rest of the subtext. So, for example if
we chose the first 1000 out of a 10000 character substring, then it
would be a simple process (and a relatively modest amunt of processing)
to repeatedly square the 1000x1000 array a few dozen times to determine
the probability of finding those 1000 characters within a 10^14-10^15
character monkey string (for values of 'few' approximately equal to 4).
Then we just scale that probability by your approximation of finding the
remaining 9000 characters. Some care might be needed to ensure that the
array elements were floating point numbers with sufficient resolution to
avoid round-off errors during the process.

Mike