Query optimization problem [PgSql]

Prev: [HACKERS] Solaris Sparc - dblink regression test failure
Next: [PATCH] "could not reattach to shared memory" on Windows

From: Robert Haas on 28 Jul 2010 07:13

On Wed, Jul 28, 2010 at 6:55 AM, Dimitri Fontaine
<dfontaine(a)hi-media.com> wrote:
> Robert Haas <robertmhaas(a)gmail.com> writes:
>> �But here you want to have different paths for
>> the same relation that generate *different output*, and the planner
>> doesn't understand that concept.
>
> Sorry? I though what Equivalence Class provides is the "proving" that
> using this qualification or another will *not* affect the output.

In a query like...

SELECT d1.ID, d2.ID
FROM DocPrimary d1
JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
WHERE (d1.ID=234409763) or (d2.ID=234409763)

....you're going to scan d1, scan d2, and then join the results. The
scan of d1 is going to produce different results depending on whether
you evaluate or not d1.ID=234409763, and the scan of d2 is going to
produce different results depending on whether or not you evaluate
d2.BasedOn=234409763.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Yeb Havinga on 28 Jul 2010 07:24

Robert Haas wrote:
> On Wed, Jul 28, 2010 at 6:55 AM, Dimitri Fontaine
> <dfontaine(a)hi-media.com> wrote:
>
>> Robert Haas <robertmhaas(a)gmail.com> writes:
>>
>>> But here you want to have different paths for
>>> the same relation that generate *different output*, and the planner
>>> doesn't understand that concept.
>>>
>> Sorry? I though what Equivalence Class provides is the "proving" that
>> using this qualification or another will *not* affect the output.
>>
>
> In a query like...
>
> SELECT d1.ID, d2.ID
> FROM DocPrimary d1
> JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
> WHERE (d1.ID=234409763) or (d2.ID=234409763)
>
> ...you're going to scan d1, scan d2, and then join the results. The
> scan of d1 is going to produce different results depending on whether
> you evaluate or not d1.ID=234409763, and the scan of d2 is going to
> produce different results depending on whether or not you evaluate
> d2.BasedOn=234409763.
>
Wouldn't it be relatively easy, to rewrite the filter expression by
adding expressions, instead of replacing constants, in the disjunctive
case, so the example at hand would become:

WHERE (d1.ID=234409763) or (d2.ID=234409763)
AND (d2.BasedOnID=234409763) or (d2.ID=234409763)

regards,
Yeb Havinga

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 28 Jul 2010 07:30

On Wed, Jul 28, 2010 at 7:24 AM, Yeb Havinga <yebhavinga(a)gmail.com> wrote:
>>> Sorry? I though what Equivalence Class provides is the "proving" that
>>> using this qualification or another will *not* affect the output.
>>
>> In a query like...
>>
>> �SELECT d1.ID, d2.ID
>> �FROM DocPrimary d1
>> � JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
>> �WHERE (d1.ID=234409763) or (d2.ID=234409763)
>>
>> ...you're going to scan d1, scan d2, and then join the results. �The
>> scan of d1 is going to produce different results depending on whether
>> you evaluate or not d1.ID=234409763, and the scan of d2 is going to
>> produce different results depending on whether or not you evaluate
>> d2.BasedOn=234409763.
>
> Wouldn't it be relatively easy, to rewrite the filter expression by adding
> expressions, instead of replacing constants, in the disjunctive case, so the
> example at hand would become:
>
> WHERE (d1.ID=234409763) or (d2.ID=234409763)
> AND (d2.BasedOnID=234409763) or (d2.ID=234409763)

Yeah, that could be done, but it's not necessarily a win from a
performance standpoint.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Dimitri Fontaine on 28 Jul 2010 07:31

Robert Haas <robertmhaas(a)gmail.com> writes:
> SELECT d1.ID, d2.ID
> FROM DocPrimary d1
> JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
> WHERE (d1.ID=234409763) or (d2.ID=234409763)
>
> ...you're going to scan d1, scan d2, and then join the results. The
> scan of d1 is going to produce different results depending on whether
> you evaluate or not d1.ID=234409763, and the scan of d2 is going to
> produce different results depending on whether or not you evaluate
> d2.BasedOn=234409763.

Well I just realised you can't use d2.BasedOn in scanning d1 here. I
don't know what exactly I had in mind previously, but in any case, sorry
for the noise.

I hope the optimiser effort control still hold water nonetheless…
--
dim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Yeb Havinga on 28 Jul 2010 08:02

Robert Haas wrote:
> On Wed, Jul 28, 2010 at 7:24 AM, Yeb Havinga <yebhavinga(a)gmail.com> wrote:
>
>>>> Sorry? I though what Equivalence Class provides is the "proving" that
>>>> using this qualification or another will *not* affect the output.
>>>>
>>> In a query like...
>>>
>>> SELECT d1.ID, d2.ID
>>> FROM DocPrimary d1
>>> JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
>>> WHERE (d1.ID=234409763) or (d2.ID=234409763)
>>>
>>> ...you're going to scan d1, scan d2, and then join the results. The
>>> scan of d1 is going to produce different results depending on whether
>>> you evaluate or not d1.ID=234409763, and the scan of d2 is going to
>>> produce different results depending on whether or not you evaluate
>>> d2.BasedOn=234409763.
>>>
>> Wouldn't it be relatively easy, to rewrite the filter expression by adding
>> expressions, instead of replacing constants, in the disjunctive case, so the
>> example at hand would become:
>>
>> WHERE (d1.ID=234409763) or (d2.ID=234409763)
>> AND (d2.BasedOnID=234409763) or (d2.ID=234409763)
>>
>
> Yeah, that could be done, but it's not necessarily a win from a
> performance standpoint.
>
Not necessarily a win, but on the test case no significant increase in
planning time. It somehow feels like a good idea to give the planner as
much information as possible, i.e. for each rel as much baserestrictinfo's.

I earlier forgot parentheses, the correct query is

SELECT d1.ID, d2.ID
FROM DocPrimary d1
JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
WHERE ((d1.ID=234409763) or (d2.ID=234409763))
AND ((d2.BasedOn=234409763) or (d2.ID=234409763));

by doing this in the rewrite step, triple planning would be avoided. I
suspect that a copyObject of the expression + expression tree mutator
call time during rewrite is negligible compared to plan time, assuming
this is minimal, in this particulare case there doesn't seem to be much
planning time between the three variants.

I ran the script below a number of times, the third time is the one with
expanded expression:

Time: 0.820 ms
Time: 0.859 ms
Time: 0.877 ms
---
Time: 0.617 ms
Time: 0.662 ms
Time: 0.737 ms
---
Time: 0.817 ms
Time: 0.766 ms
Time: 0.826 ms
---
Time: 0.638 ms
Time: 0.700 ms
Time: 0.706 ms
---
Time: 0.463 ms
Time: 0.847 ms
Time: 0.793 ms
---
Time: 0.629 ms
Time: 0.671 ms
Time: 0.703 ms

this was the script (on the relation and index supplied by the OP)

-- warm catalog
explain SELECT d1.ID, d2.ID
FROM DocPrimary d1
JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
WHERE (d1.ID=234409763) or (d2.ID=234409763);

\timing

explain SELECT d1.ID, d2.ID
FROM DocPrimary d1
JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
WHERE (d1.ID=234409763) or (d2.ID=234409763);

explain SELECT d1.ID, d2.ID
FROM DocPrimary d1
JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
WHERE (d2.BasedOn=234409763) or (d2.ID=234409763);

explain SELECT d1.ID, d2.ID
FROM DocPrimary d1
JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
WHERE ((d1.ID=234409763) or (d2.ID=234409763))
AND ((d2.BasedOn=234409763) or (d2.ID=234409763));

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: [HACKERS] Solaris Sparc - dblink regression test failure
Next: [PATCH] "could not reattach to shared memory" on Windows