Query optimization problem [PgSql]

Prev: [HACKERS] Solaris Sparc - dblink regression test failure
Next: [PATCH] "could not reattach to shared memory" on Windows

From: Zotov on 22 Jul 2010 08:30

20.07.2010 18:31, Robert Haas:
> According to the EXPLAIN ANALYZE output, your "slow" query is
> executing in 0.007 ms, and your "fast" query is executing in 0.026 ms
> (i.e. not as quickly as the slow query). Since you mention that it
> takes 7 s further down, I suspect this is not the real EXPLAIN ANALYZE
> output on the real data that you're having a problem with. You might
> have better luck if you post the actual EXPLAIN ANALYZE output here.
> Incidentally, sorry for not responding sooner to your private email -
> I was on vacation last week. But please do keep all replies on-list
> so that everyone can comment.
>
> All that having been said, I think the issue here is that the query
> planner isn't inferring that d1.ID=<some constant> implies d2.ID=<some
> constant>, even though there's a join clause d1.ID=d2.ID. I'm not
> really sure why it isn't doing that... I suspect Tom Lane is the only
> person who can comment intelligently on that, and he's away this week
> (but if anyone else has an idea, feel free to jump in...).
>
Yes, I have a mistake when EXPLAIN ANALYZE without data.. It create
another plan, because seq scan were faster. Now I send results on real
data (1 million rows)

*Slow Query:*
-------------------------------------------------
test=# EXPLAIN (ANALYZE on, VERBOSE on, COSTS on, BUFFERS off )SELECT
d1.ID, d2.ID
test-# FROM DocPrimary d1
test-# JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
test-# WHERE (d1.ID=234409763
test(# ) OR (d2.ID=234409763);

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=2222.72..53967.30 rows=1 width=8) (actual
time=6697.847..6697.847 rows=0 loops=1)
Output: d1.id, d2.id
Merge Cond: (d1.id = d2.basedon)
Join Filter: ((d1.id = 234409763) OR (d2.id = 234409763))
-> Index Scan using id_pk on public.docprimary d1
(cost=0.00..37224.48 rows=1076842 width=4) (actual time=0.016..3184.474
rows=1076795 loops=1)
Output: d1.id, d1.basedon
-> Index Scan using basedon_idx on public.docprimary d2
(cost=0.00..46245.14 rows=1076842 width=8) (actual time=0.011..1861.570
rows=235362 loops=1)
Output: d2.id, d2.basedon
Total runtime: 6697.968 ms
(9 rows)
-----------------------------------------------

*Fast Query:*
----------------------------------------------
test=# EXPLAIN (ANALYZE on, VERBOSE on, COSTS on, BUFFERS off )SELECT
d1.ID, d2.ID
test-# FROM DocPrimary d1
test-# JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
test-# WHERE (d1.ID=234409763 and d2.BasedOn=234409763
test(# ) OR (d2.ID=234409763);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=9.01..422.70 rows=1 width=8) (actual
time=0.145..0.145 rows=0 loops=1)
Output: d1.id, d2.id
Join Filter: (((d1.id = 234409763) AND (d2.basedon = 234409763)) OR
(d2.id = 234409763))
-> Bitmap Heap Scan on public.docprimary d2 (cost=9.01..136.90
rows=34 width=8) (actual time=0.141..0.141 rows=0 loops=1)
Output: d2.id, d2.basedon
Recheck Cond: ((d2.basedon = 234409763) OR (d2.id = 234409763))
-> BitmapOr (cost=9.01..9.01 rows=34 width=0) (actual
time=0.136..0.136 rows=0 loops=1)
-> Bitmap Index Scan on basedon_idx (cost=0.00..4.62
rows=33 width=0) (actual time=0.078..0.078 rows=0 loops=1)
Index Cond: (d2.basedon = 234409763)
-> Bitmap Index Scan on id_pk (cost=0.00..4.38 rows=1
width=0) (actual time=0.051..0.051 rows=0 loops=1)
Index Cond: (d2.id = 234409763)
-> Index Scan using id_pk on public.docprimary d1 (cost=0.00..8.39
rows=1 width=4) (never executed)
Output: d1.id, d1.basedon
Index Cond: (d1.id = d2.basedon)
Total runtime: 0.233 ms
(15 rows)
------------------------------------------

I use another fast query:
SELECT d1.ID, d2.ID
FROM DocPrimary d1
JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
WHERE (*d1.ID=234409763 and d2.BasedOn=234409763*) OR (d2.ID=234409763)

Bolded part of query was d2.BasedOn=234409763 I replace it because it
can help find way to optimize it automaticaly

So sorry, but i can`t give programmer to do something in Postgres, because
we don`t use it now as supported DB, we think about it and do some tests.
It`s very hard and slow task (support another DB, now we use FireBird,
and plan use another DB, and look for Postgres and MSSQL, maybe support
it both as free and commercial DB solution)
And in our department only 4 (with me) programmers who can programm on
"System Level", and only one of us (doesn`t me) know C/C++
We all programming on Delphi... If we choose Postgres as free DB
platform then I can think about give programmers for Postgress development.

In so large letters my English stay more bad :)

From: Tom Lane on 27 Jul 2010 13:37

Robert Haas <robertmhaas(a)gmail.com> writes:
> On Tue, Jul 20, 2010 at 11:23 AM, Dimitri Fontaine
> <dfontaine(a)hi-media.com> wrote:
>> The specific diff between the two queries is :
>>
>> JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
>> - WHERE (d1.ID=234409763) or (d2.ID=234409763)
>> + WHERE (d2.BasedOn=234409763) or (d2.ID=234409763)
>>
>> So the OP would appreciate that the planner is able to consider applying
>> the restriction on d2.BasedOn rather than d1.ID given that d2.BasedOn is
>> the same thing as d1.ID, from the JOIN.
>>
>> I have no idea if Equivalence Classes are where to look for this, and if
>> they're meant to extend up to there, and if that's something possible or
>> wise to implement, though.

> I was thinking of the equivalence class machinery as well. I think
> the OR clause may be the problem. If you just had d1.ID=constant, I
> think it would infer that d1.ID, d2.BasedOn, and the constant formed
> an equivalence class.

Right. Because of the OR, it is *not* possible to conclude that
d2.basedon is always equal to 234409763, which is the implication of
putting them into an equivalence class.

In the example, we do have d1.id and d2.basedon grouped in an
equivalence class. So in principle you could substitute d1.id into the
WHERE clause in place of d2.basedon, once you'd checked that it was
being used with an operator that's compatible with the specific
equivalence class (ie it's in one of the eclass's opfamilies, I think).
The problem is to recognize that such a rewrite would be a win --- it
could just as easily be a big loss.

Even if we understood how to direct the rewriting process, I'm really
dubious that it would win often enough to justify the added planning
time. The particular problem here seems narrow enough that solving it
on the client side is probably a whole lot easier and cheaper than
trying to get the planner to do it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Dimitri Fontaine on 28 Jul 2010 03:45

Tom Lane <tgl(a)sss.pgh.pa.us> writes:
> In the example, we do have d1.id and d2.basedon grouped in an
> equivalence class. So in principle you could substitute d1.id into the
> WHERE clause in place of d2.basedon, once you'd checked that it was
> being used with an operator that's compatible with the specific
> equivalence class (ie it's in one of the eclass's opfamilies, I think).
> The problem is to recognize that such a rewrite would be a win --- it
> could just as easily be a big loss.

Ok, that was my feeling too.

> Even if we understood how to direct the rewriting process, I'm really
> dubious that it would win often enough to justify the added planning
> time. The particular problem here seems narrow enough that solving it
> on the client side is probably a whole lot easier and cheaper than
> trying to get the planner to do it.

My overly naive uneducated idea here would be to produce both the plans
and let the planner evaluate their respective costs. Maybe that's what
you mean here by "how to direct the rewriting process". Then we don't
want to generate too many useless plans when you have lots of eclass
around.

This brings back the idea of pondering somehow the optimiser effort
pushed into "solving" a query plan. Like in gcc we can use different
effort targets and we don't know for sure before hand if -O3 will
produce faster code than -O2, all we know is that it will try harder.

Is it possible to imagine having a plan_eclass_permutations default to
false that would activate the discussed behavior here? Ok, I'm not sure
what form should take such a setting, but clearly, there's a need to be
able to impact the optimiser effort.

Regards,
--
dim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 28 Jul 2010 06:44

On Wed, Jul 28, 2010 at 3:45 AM, Dimitri Fontaine
<dfontaine(a)hi-media.com> wrote:
>> Even if we understood how to direct the rewriting process, I'm really
>> dubious that it would win often enough to justify the added planning
>> time. �The particular problem here seems narrow enough that solving it
>> on the client side is probably a whole lot easier and cheaper than
>> trying to get the planner to do it.
>
> My overly naive uneducated idea here would be to produce both the plans
> and let the planner evaluate their respective costs. Maybe that's what
> you mean here by "how to direct the rewriting process". Then we don't
> want to generate too many useless plans when you have lots of eclass
> around.

The way the planner is set up, you'd have to plan with qual A, then
repeat the entire process with qual B, and then just for good measure
repeat the process with both quals A and B. ISTM you'd triple the
planning time if there were even just one case of this in a particular
query. If you have different ways of generating the same output for a
given rel, you can just throw them all into a bucket and let the
planner work it out. But here you want to have different paths for
the same relation that generate *different output*, and the planner
doesn't understand that concept.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Dimitri Fontaine on 28 Jul 2010 06:55

Robert Haas <robertmhaas(a)gmail.com> writes:
> But here you want to have different paths for
> the same relation that generate *different output*, and the planner
> doesn't understand that concept.

Sorry? I though what Equivalence Class provides is the "proving" that
using this qualification or another will *not* affect the output.
--
dim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: [HACKERS] Solaris Sparc - dblink regression test failure
Next: [PATCH] "could not reattach to shared memory" on Windows