Prev: Patch to show individual statement latencies in pgbench output
Next: Patch to show individual statement latencies in pgbench output
From: Tom Lane on 28 Jul 2010 19:33 Oleg Bartunov <oleg(a)sai.msu.su> writes: > you can download dump http://mira.sai.msu.su/~megera/tmp/search_tab.dump Hmm ... I'm not sure why you're failing to reproduce it, because it's falling over pretty easily for me. After poking at it for awhile, I am of the opinion that scanGetItem's handling of multiple keys is fundamentally broken and needs to be rewritten completely. The particular case I'm seeing here is that one key returns this sequence of TIDs/lossy flags: .... 1085/4 0 1086/65535 1 1087/4 0 .... while the other one returns this: .... 1083/11 0 1086/6 0 1086/10 0 1087/10 0 .... and what comes out of scanGetItem is just .... 1086/6 1 .... because after returning that, on the next call it advances both input keystreams. So 1086/10 should be visited and is not. I think that depending on the previous entryRes state to determine what to do is basically unworkable, and what should probably be done instead is to remember the last-returned TID and advance keystreams with TIDs <= that. I haven't quite thought through how that should interact with lossy-page TIDs but it seems more robust than what we've got. I'm also noticing that the ANDing behavior for the "ee:* & dd:*" query style seems very much stupider than it needs to be --- it's returning lossy pages that very obviously don't need to be examined because the other keystream has no match at all on that page. But I haven't had time to probe into the reason why. I'm out of time for today, do you want to work on it? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 29 Jul 2010 10:03 Oleg Bartunov <oleg(a)sai.msu.su> writes: > I also wonder why did I get "right" result :) Just repeated the query: > test=# select count(*) from search_tab where (to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:* & dd:*')); > count > ------- > 123 > (1 row) Yeah, that case works (though I think it's unnecessarily slow). The one that gives the wrong answer is the equivalent form with two AND'ed @@ operators. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 29 Jul 2010 12:28
Oleg Bartunov <oleg(a)sai.msu.su> writes: > On Thu, 29 Jul 2010, Tom Lane wrote: >> Yeah, that case works (though I think it's unnecessarily slow). The one >> that gives the wrong answer is the equivalent form with two AND'ed @@ >> operators. > hmm, that query works too :) There may be some platform dependency involved --- in particular, you wouldn't see the issue unless one keystream has two nonlossy TIDs on the same page as the other one has a lossy TID, so it's going to depend on the placement of heap rows. Anyway, I can reproduce it just by loading the given dump, on both 8.4 and HEAD. Will work on a fix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |