Prev: bitmap-index-scan faster than seq-scan on full-table-scan (gin index)
Next: is_absolute_path incorrect on Windows
From: Jesper Krogh on 31 May 2010 17:09 On 2010-05-31 22:09, Tom Lane wrote: > Jesper Krogh<jesper(a)krogh.cc> writes: > >> Conceptually searching for the "full dataset" would always be fastest >> solved by a seq-scan. The query planner enforces this so much, so not >> even "enable_seqscan=off" can convince it to to something else. >> ... >> Would it be possible to implement the "Filtering" using the gin-index and >> a subsequent visibillity-check as on the index-scan? >> > You're failing to make any sense whatsoever. If you're reading the full > dataset, there is no filter condition. If there is a potentially > indexable filter condition, the planner will certainly consider that. > Yes, you're totally right on that (about making sense). But that is because of the "simplified" use-case described. A more elaborate description .. I have a table with has a set of colums attached to it typically used for "sorting" these columns may also be inferred on the table by a typical join condition and a document that is fts-indexed. So the actual use-case is that people query for: "give me the 1000 most recent documents matching <term>" Term may in some cases be hugely trivial, only filtering away 0.001% of the dataset resulting in a "index-scan on a btree date index" filtering on the tsvector column for <term>". Term may also be something really specific only returning a single or a few documents and just pushing a post-sorting to get the ordering. But in the case where the query-planner falls over to a "index-scan" on one of the btree-indices it ends up reading over from the TOAST data. Will the planner consider doing the index-scan(forward or backwards) on a btree-index and filter using a gin-index instead of filtering directly on the tuple-data? (I haven't been able to enforce an query-plan that looks like that). > Personally I think the issue here has got more to do with the > non-immutability of the single-argument form of to_tsquery, which means > it gets re-evaluated at every row during a seqscan. Do your results > change if you work with to_tsquery('english', ...) (or whatever your > default TS config is)? > It is english.. and yes it did indeed change the results. So the expensive case dropped from ~60s to ~28s and the cheap case from ~7.3s to ~4.3s, that is quite surprising that such "small" change can have that huge impact. The single-argument version should be forbidden. But the performance ratio between the two cases is still the same. The test was actually run with the preliminary gincostestimate-patch from Oleg Bartunov so the actual cost estimates match way better now, but that should not impact the actual runtime. Thanks -- Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |