[HACKERS] cross column correlation revisted [PgSql]

Prev: Simple hack to get $500 to your home.
Next: cross column correlation revisted

From: =?iso-8859-1?Q?PostgreSQL_-_Hans-J=FCrgen_Sch=F6nig?= on 14 Jul 2010 06:12

hello everybody,

we are currently facing some serious issues with cross correlation issue.
consider: 10% of all people have breast cancer. we have 2 genders (50:50).
if i select all the men with breast cancer, i will get basically nobody - the planner will overestimate the output.
this is the commonly known problem ...

this cross correlation problem can be quite nasty in many many cases.
underestimated nested loops can turn joins into a never ending nightmare and so on and so on.

my ideas is the following:
what if we allow users to specifiy cross-column combinations where we keep separate stats?
maybe somehow like this ...

ALTER TABLE x SET CORRELATION STATISTICS FOR (id = id2 AND id3=id4)

or ...

ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id = y.id AND x.id2 = y.id2)

clearly we cannot store correlation for all combinations of all columns so we somehow have to limit it.

what is the general feeling about something like that?

many thanks,

hans

--
Cybertec Sch�nig & Sch�nig GmbH
Gr�hrm�hlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

|
Pages: 1
Prev: Simple hack to get $500 to your home.
Next: cross column correlation revisted