From: Robert Haas on 14 Jul 2010 11:13 2010/7/14 Tom Lane <tgl(a)sss.pgh.pa.us>: > If the combination of columns is actually interesting, there might well > be an index in place, or the DBA might be willing to create it. Indexes aren't free, though, nor even close to it. Still, I think we should figure out the underlying mechanism first and then design the interface afterwards. One idea I had was a way to say "compute the MCVs and histogram buckets for this table WHERE <predicate>". If you can prove predicate for a particular query, you can use the more refined statistics in place of the full-table statistics. This is fine for the breast cancer case, but not so useful for the zip code/street name case (which seems to be the really tough one). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: marcin mank on 14 Jul 2010 17:57 On Wed, Jul 14, 2010 at 5:13 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: > 2010/7/14 Tom Lane <tgl(a)sss.pgh.pa.us>: >> If the combination of columns is actually interesting, there might well >> be an index in place, or the DBA might be willing to create it. > > Indexes aren't free, though, nor even close to it. > > Still, I think we should figure out the underlying mechanism first and > then design the interface afterwards. One idea I had was a way to say > "compute the MCVs and histogram buckets for this table WHERE > <predicate>". If you can prove predicate for a particular query, you > can use the more refined statistics in place of the full-table > statistics. This is fine for the breast cancer case, but not so > useful for the zip code/street name case (which seems to be the really > tough one). > One way of dealing with the zipcode problem is estimating NDST = count(distinct row(zipcode, street)) - i.e. multi-column ndistinct. Then the planner doesn`t have to assume that the selectivity of a equality condition involving both zipcode and city is a multiple of the respective selectivities. As a first cut it can assume that it will get count(*) / NDST rows, but there are ways to improve it. Greetings Marcin Mańk -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Dimitri Fontaine on 14 Jul 2010 19:33 Joshua Tolley <eggyknap(a)gmail.com> writes: >> >> ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id =3D y.id AND x.id= 2 =3D y.id2) >> >=20 >> it says X and Y ... the selectivity of joins are what i am most >> interested in. cross correlation of columns within the same table are >> just a byproduct. the core thing is: how can i estimate the number >> of rows returned from a join? > > All the discussion of this topic that I've seen has been limited to the s= ingle > table case. The hard problem in that case is coming up with something you= can > precalculate that will actually be useful during query planning, without > taking too much disk, memory, CPU, or something else. Expanding the discu= ssion > to include join relations certainly still has valid use cases, but is even > harder, because you've also got to keep track of precisely how the underl= ying > relations are joined, so you know in what context the statistics remain v= alid. Well I've been proposing to handle the correlation problem in another way in some past mails here, and I've been trying to write it down too: http://archives.postgresql.org/pgsql-performance/2009-06/msg00118.php http://tapoueh.org/char10.html#sec13 What I propose is to extend ANALYZE to be able to work on a VIEW too, rather than just a table. The hard parts seems to be: a. what stats to record, exploiting the view definition the best we can b. how to match a user query against the view definitions we have in order to actually use the stats If you have answers or good ideas=C2=A0:) Regards, --=20 dim -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: =?iso-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= on 15 Jul 2010 06:04 hello ... a view is already nice but i think it is still too narrow. the problem is: you don't want a view for every potential join. in addition to that - ideally there is not much left of a view when it comes to checking for costs. so, i think, this is not the kind of approach leading to total success here. one side question: does anybody happen to know how this is one in oracle or db2? many thanks, hans On Jul 15, 2010, at 1:33 AM, Dimitri Fontaine wrote: > Joshua Tolley <eggyknap(a)gmail.com> writes: >>>>> ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id =3D y.id AND x.id= > 2 =3D y.id2) >>>> =20 >>> it says X and Y ... the selectivity of joins are what i am most >>> interested in. cross correlation of columns within the same table are >>> just a byproduct. the core thing is: how can i estimate the number >>> of rows returned from a join? >> >> All the discussion of this topic that I've seen has been limited to the s= > ingle >> table case. The hard problem in that case is coming up with something you= > can >> precalculate that will actually be useful during query planning, without >> taking too much disk, memory, CPU, or something else. Expanding the discu= > ssion >> to include join relations certainly still has valid use cases, but is even >> harder, because you've also got to keep track of precisely how the underl= > ying >> relations are joined, so you know in what context the statistics remain v= > alid. > > Well I've been proposing to handle the correlation problem in another > way in some past mails here, and I've been trying to write it down too: > > http://archives.postgresql.org/pgsql-performance/2009-06/msg00118.php > http://tapoueh.org/char10.html#sec13 > > What I propose is to extend ANALYZE to be able to work on a VIEW too, > rather than just a table. The hard parts seems to be: > > a. what stats to record, exploiting the view definition the best we can > b. how to match a user query against the view definitions we have in > order to actually use the stats > > If you have answers or good ideas=C2=A0:) > > Regards, > --=20 > dim > > > -- > dim > -- Cybertec Sch�nig & Sch�nig GmbH Gr�hrm�hlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: David Fetter on 15 Jul 2010 10:14 On Thu, Jul 15, 2010 at 12:04:21PM +0200, Hans-J�rgen Sch�nig wrote: > hello ... > > a view is already nice but i think it is still too narrow. One sure way to fail is to take on a problem in chunks too large. If we get even one of the cross-column issues solved by statistics, we'll be ahead of all our competition, both free and proprietary. Cheers, David. -- David Fetter <david(a)fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter(a)gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: [HACKERS] cross column correlation revisted Next: five-key syscaches |