Prev: #define _XOPEN_SOURCE 600 - where to define?
Next: WinGDB - debugging remote Linux/Unix, MinGW/Cygwin, embedded systems under Visual Studio
From: mike on 21 Mar 2010 18:34 In article <He-dnXLmZ6o_VzjWnZ2dnUVZ8l2dnZ2d(a)bt.com>, rjh(a)see.sig.invalid says... > mike wrote: > > In article <OKCdnXnE3JeATwPWnZ2dnUVZ8jVi4p2d(a)bt.com>, > > rjh(a)see.sig.invalid says... > >> mike wrote: > >> <snip> > >> > >>> I think that as a reasonable compromise I am willing to admit that, for > >>> any moderately long search string that does not have a lot of copies of > >>> the first n letters of the string scattered through the rest of the > >>> string, your estimate is probably as exact as anyone would care to > >>> require. My slightly unfair examples only emphasised the difference > >>> between your prediction and reality because they were fairly short and > >>> the 'pattern' in the strings influenced the probability. > >> So it's reasonable compromises now, is it? > >> > > I never said it wasn't. > > > > If you remember: > > > > 1) Someone else mentioned the probability of hitting the right text. > > 2) You provided a formula to calculate what that probability was. > > 3) Someone else pointed out that your formula was incorrect (and why). > > 4) You admitted the fact and asked for a better formula. > > 5) I provided an exact solution... > > 6) ...which you suggested was computationally difficult, and asked for a > > compromise solution. > > 7) I pointed out that your initial solution would be 'good enough' in > > normal circumstances. > > > > At no point did I suggest that your formula was not a reasonable > > compromise. All I did was provide you with what you requested and, for > > illustrative purposes, describe some circumstances where your solution > > would be inadequate. > > > Ah - we have here a light-hearted all-Usenauts-together reply taken far > too literally, and thoroughly but unnecessarily rebuffed; folks, things > were touch and go there for a while, but Usenet is back to normal again! > > :-) > > (Sorry for the late reply - I've been kinda busy.) > And sorry if I appeared to overreact - the coffee machine was empty. Mike
From: Richard Heathfield on 22 Mar 2010 02:50 mike wrote: > In article <He-dnXLmZ6o_VzjWnZ2dnUVZ8l2dnZ2d(a)bt.com>, > rjh(a)see.sig.invalid says... <snip> >> (Sorry for the late reply - I've been kinda busy.) >> > And sorry if I appeared to overreact - the coffee machine was empty. I feel your pain; so, as soon as I've posted this article, I'm going to fax some coffee to you, to tide you over until the vendor's next delivery. -- Richard Heathfield <http://www.cpax.org.uk> Email: -http://www. +rjh@ "Usenet is a strange place" - dmr 29 July 1999 Sig line vacant - apply within
From: mike on 22 Mar 2010 18:56
In article <ndydnWHHL76kjjrWnZ2dnUVZ8kti4p2d(a)bt.com>, rjh(a)see.sig.invalid says... > mike wrote: > > In article <He-dnXLmZ6o_VzjWnZ2dnUVZ8l2dnZ2d(a)bt.com>, > > rjh(a)see.sig.invalid says... > <snip> > > >> (Sorry for the late reply - I've been kinda busy.) > >> > > And sorry if I appeared to overreact - the coffee machine was empty. > > I feel your pain; so, as soon as I've posted this article, I'm going to > fax some coffee to you, to tide you over until the vendor's next delivery. > I have invested in a portable filter cup and a supply of fresh grounds now - so I can avoid recurence of the above. But do look forwards to your fax. Further to the original problem though, I believe (but have not taken the time to prove) that in practice, if we were looking for an uncompressed ascii subtext in a large random (monkey generated) string, then the small variations in probability from your approximation would be due to repetitions of the first few characters of the subtest - and only the first few repetitions would make any significant difference. So I believe it might be practical to take just the first few characters (maybe in the range of 100-1000) of the subtext, determine the influence of pattern on the probability of finding that text and then extrapolate with your approximation for the rest of the subtext. So, for example if we chose the first 1000 out of a 10000 character substring, then it would be a simple process (and a relatively modest amunt of processing) to repeatedly square the 1000x1000 array a few dozen times to determine the probability of finding those 1000 characters within a 10^14-10^15 character monkey string (for values of 'few' approximately equal to 4). Then we just scale that probability by your approximation of finding the remaining 9000 characters. Some care might be needed to ensure that the array elements were floating point numbers with sufficient resolution to avoid round-off errors during the process. Mike |