Prev: [PATCH] Add XMLEXISTS function from the SQL/XML standard
Next: [HACKERS] mergejoin null handling (was Re: [PERFORM] merge join killing performance)
From: Tom Lane on 30 May 2010 18:07 =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <wulczer(a)wulczer.org> writes: > Here's a patch against recent git, but should apply to 8.4 sources as > well. It would be interesting to measure the memory and time needed to > analyse the table after applying it, because we will be now using a lot > bigger bucket size and I haven't done any performance impact testing on > it. I did a little bit of testing using a dataset I had handy (a couple hundred thousand publication titles) and found that ANALYZE seems to be noticeably but far from intolerably slower --- it's almost the same speed at statistics targets up to 100, and even at the max setting of 10000 it's only maybe 25% slower. However I'm not sure if this result will scale to very large document sets, so more testing would be a good idea. I committed the attached revised version of the patch. Revisions are mostly minor but I did make two substantive changes: * The patch changed the target number of mcelems from 10 * statistics_target to just statistics_target. I reverted that since I don't think it was intended; at least we hadn't discussed it. * I modified the final processing to avoid one qsort step if there are fewer than num_mcelems hashtable entries that pass the cutoff frequency filter, and in any case to sort only those entries that pass it rather than all of them. With the significantly larger number of hashtable entries that will now be used, it seemed like a good thing to try to cut the qsort overhead. regards, tom lane
From: =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= on 30 May 2010 18:24 On 31/05/10 00:07, Tom Lane wrote: > =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <wulczer(a)wulczer.org> writes: > I committed the attached revised version of the patch. Revisions are > mostly minor but I did make two substantive changes: > > * The patch changed the target number of mcelems from 10 * > statistics_target to just statistics_target. I reverted that since > I don't think it was intended; at least we hadn't discussed it. Yeah, that was accidental. > * I modified the final processing to avoid one qsort step if there are > fewer than num_mcelems hashtable entries that pass the cutoff frequency > filter, and in any case to sort only those entries that pass it rather > than all of them. With the significantly larger number of hashtable > entries that will now be used, it seemed like a good thing to try to > cut the qsort overhead. Make sense. Thanks, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Jesper Krogh on 31 May 2010 14:12 On 2010-05-30 20:02, Jan UrbaĆski wrote: > Here's a patch against recent git, but should apply to 8.4 sources as > well. It would be interesting to measure the memory and time needed to > analyse the table after applying it, because we will be now using a lot > bigger bucket size and I haven't done any performance impact testing on > it. I updated the initial comment block in compute_tsvector_stats, but > the prose could probably be improved. > Just a small follow up. I tried out the patch (or actually a fresh git checkout) and it now gives very accurate results for both upper and lower end of the MCE-histogram with a lower cutoff that doesn't approach 2. Thanks alot. -- Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 31 May 2010 14:38 Jesper Krogh <jesper(a)krogh.cc> writes: > Just a small follow up. I tried out the patch (or actually a fresh git > checkout) and it now gives very accurate results for both upper and > lower end of the MCE-histogram with a lower cutoff that doesn't > approach 2. Good. How much did the ANALYZE time change for your table? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Jesper Krogh on 1 Jun 2010 00:50
On 2010-05-31 20:38, Tom Lane wrote: > Jesper Krogh<jesper(a)krogh.cc> writes: > >> Just a small follow up. I tried out the patch (or actually a fresh git >> checkout) and it now gives very accurate results for both upper and >> lower end of the MCE-histogram with a lower cutoff that doesn't >> approach 2. >> > Good. How much did the ANALYZE time change for your table? > 1.3m documents. New code ( 3 runs): statistics target 1000 => 155s/124s/110s statictics target 100 => 86s/55s/61s Old code: statistics target 1000 => 158s/101s/99s statistics target 100 => 90s/29s/33s Somehow I think that the first run is the relevant one, its pretty much a "dead disk" test, and I wouldn't expect that random sampling of tuples would have any sane caching effect in a production system. But it looks like the algoritm is "a bit" slower. Thanks again.. Jesper -- Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |