Prev: [HACKERS] Connection leak in dblink on duplicate names
Next: [HACKERS] How about closing some Open Items?
From: Robert Haas on 8 Jun 2010 07:40 On Sun, Jun 6, 2010 at 10:24 AM, Alexander Korotkov <aekorotkov(a)gmail.com> wrote: > I think that such parameters don't have optimal value for all the cases; What makes you think that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Stark on 8 Jun 2010 08:53 On Tue, Jun 8, 2010 at 12:40 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: > On Sun, Jun 6, 2010 at 10:24 AM, Alexander Korotkov > <aekorotkov(a)gmail.com> wrote: >> I think that such parameters don't have optimal value for all the cases; > > What makes you think that? Actually the whole signature method is a bit of a crock. I posted about this previously as a tangent on a thread about bloom filters but I can't find it now. In short the "signature" is actually a degenerate bloom filter with one hash function. Sizing bloom filters and choosing the number of hash functions is a solved problem and while we don't necessarily have all the input variables needed to do it optimally it's clear that a single hash function is virtually never going to be ideal and these filters are very small leading to high false positive rates. To improve matters I think you need to know the number of distinct values that are going to appear in an array. That's something the user would have to provide or we would have to calculate due an ANALYZE run. Then we can select the ideal number of hash functions and size the array to target a chosen false-positive rate. To *really* improve matters the index structure has to be adjusted to allow for variable size arrays. Then we can use large filters at higher index levels and smaller filters at lower levels where they hold fewer values. I don't see how to make that work offhand though unless we rescan the heap tuples when we grow the arrays. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 8 Jun 2010 14:41
On Tue, Jun 8, 2010 at 8:53 AM, Greg Stark <gsstark(a)mit.edu> wrote: > On Tue, Jun 8, 2010 at 12:40 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: >> On Sun, Jun 6, 2010 at 10:24 AM, Alexander Korotkov >> <aekorotkov(a)gmail.com> wrote: >>> I think that such parameters don't have optimal value for all the cases; >> >> What makes you think that? > > Actually the whole signature method is a bit of a crock. My point was just that we're unlikely to make any changes to the code without, say, some performance results. I suspect we're unlikely to make the exact change being asked for in any case, but the point is that if you think something isn't right, it's good to back up that position with some data -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |