Endgame for all those SELECT FOR UPDATE changes: fix plan node order [PgSql]

Prev: [HACKERS] License clarification: BSD vs MIT
Next: Extraneous newlines in logfile from vacuumdb

From: Robert Haas on 25 Oct 2009 23:29

On Oct 25, 2009, at 10:34 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:

> Now that we've got a hopefully-non-broken implementation of SELECT FOR
> UPDATE locking as a plan node, we can finally contemplate fixing two
> misbehaviors that are called out on the SELECT reference page:
>
> It is possible for a SELECT command using both LIMIT and FOR
> UPDATE/SHARE clauses to return fewer rows than specified by
> LIMIT. This
> is because LIMIT is applied first. The command selects the
> specified
> number of rows, but might then block trying to obtain a lock on
> one or
> more of them. Once the SELECT unblocks, the row might have been
> deleted
> or updated so that it does not meet the query WHERE condition
> anymore,
> in which case it will not be returned.
>
> Similarly, it is possible for a SELECT command using ORDER BY and
> FOR
> UPDATE/SHARE to return rows out of order. This is because ORDER
> BY is
> applied first. The command orders the result, but might then block
> trying to obtain a lock on one or more of the rows. Once the SELECT
> unblocks, one of the ordered columns might have been modified and
> be
> returned out of order. A workaround is to perform SELECT ... FOR
> UPDATE/SHARE and then SELECT ... ORDER BY.
>
> All that we have to do to fix the first one is to put the LockRows
> node
> below the Limit node instead of above it. The solution for the second
> one is to also put LockRows underneath the Sort node, and to regard
> its
> output as unsorted so that a Sort node will certainly be generated.
> (This in turn implies that we should prefer the cheapest-total plan
> for the rest of the query.)

This seems like it could potentially introduce a performance
regression, but the current behavior is so bizarre that it seems like
we should still change it.

> Does anyone have any objections to this? I can't see that it will
> break
> any applications that work today, but maybe I'm missing something.

I'm pretty excited about this. It is a nifty piece of foot-gun
removal. Thanks!

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 25 Oct 2009 23:40

Robert Haas <robertmhaas(a)gmail.com> writes:
> On Oct 25, 2009, at 10:34 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> ... The solution for the second
>> one is to also put LockRows underneath the Sort node, and to regard its
>> output as unsorted so that a Sort node will certainly be generated.
>> (This in turn implies that we should prefer the cheapest-total plan
>> for the rest of the query.)

> This seems like it could potentially introduce a performance
> regression, but the current behavior is so bizarre that it seems like
> we should still change it.

Yeah, it could definitely run slower than the existing code --- in
particular the combination of all three (FOR UPDATE ORDER BY LIMIT)
would tend to become a seqscan-and-sort rather than possibly just
reading one end of an index. However, I quote the old aphorism that
it can be made indefinitely fast if it doesn't have to give the right
answer. The reason the current behavior is fast is it's giving the
wrong answer :-(

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 26 Oct 2009 10:30

Alvaro Herrera <alvherre(a)commandprompt.com> writes:
> Tom Lane escribi�:
>> Yeah, it could definitely run slower than the existing code --- in
>> particular the combination of all three (FOR UPDATE ORDER BY LIMIT)
>> would tend to become a seqscan-and-sort rather than possibly just
>> reading one end of an index. However, I quote the old aphorism that
>> it can be made indefinitely fast if it doesn't have to give the right
>> answer. The reason the current behavior is fast is it's giving the
>> wrong answer :-(

> So this probably merits a warning in the release notes for people to
> check that their queries continue to run with the performance they
> expect.

One problem with this is that there isn't any good way for someone to
get back the old behavior if they want to. Which might be a perfectly
reasonable thing, eg if they know that no concurrent update is supposed
to change the sort-key column. The obvious thing would be to allow

select * from (select * from foo order by col limit 10) ss for update;

to apply the FOR UPDATE last. Unfortunately, that's not how it works
now because the FOR UPDATE will get pushed down into the subquery.
I was shot down when I proposed a related change, a couple weeks ago
http://archives.postgresql.org/message-id/7741.1255278907(a)sss.pgh.pa.us
but it seems like we might want to reconsider.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 26 Oct 2009 10:44

On Mon, Oct 26, 2009 at 10:30 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre(a)commandprompt.com> writes:
>> Tom Lane escribió:
>>> Yeah, it could definitely run slower than the existing code --- in
>>> particular the combination of all three (FOR UPDATE ORDER BY LIMIT)
>>> would tend to become a seqscan-and-sort rather than possibly just
>>> reading one end of an index. However, I quote the old aphorism that
>>> it can be made indefinitely fast if it doesn't have to give the right
>>> answer. The reason the current behavior is fast is it's giving the
>>> wrong answer :-(
>
>> So this probably merits a warning in the release notes for people to
>> check that their queries continue to run with the performance they
>> expect.
>
> One problem with this is that there isn't any good way for someone to
> get back the old behavior if they want to. Which might be a perfectly
> reasonable thing, eg if they know that no concurrent update is supposed
> to change the sort-key column. The obvious thing would be to allow
>
> select * from (select * from foo order by col limit 10) ss for update;
>
> to apply the FOR UPDATE last. Unfortunately, that's not how it works
> now because the FOR UPDATE will get pushed down into the subquery.
> I was shot down when I proposed a related change, a couple weeks ago
> http://archives.postgresql.org/message-id/7741.1255278907(a)sss.pgh.pa.us
> but it seems like we might want to reconsider.

"Shot down" might be an overstatement of the somewhat cautious
reaction that proposal. :-)

Could the desired behavior be obtained using a CTE?

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 26 Oct 2009 11:54

Robert Haas <robertmhaas(a)gmail.com> writes:
> On Mon, Oct 26, 2009 at 10:30 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> One problem with this is that there isn't any good way for someone to
>> get back the old behavior if they want to. �Which might be a perfectly
>> reasonable thing, eg if they know that no concurrent update is supposed
>> to change the sort-key column. �The obvious thing would be to allow
>>
>> select * from (select * from foo order by col limit 10) ss for update;
>>
>> to apply the FOR UPDATE last. �Unfortunately, that's not how it works
>> now because the FOR UPDATE will get pushed down into the subquery.

> Could the desired behavior be obtained using a CTE?

Nope, we push FOR UPDATE into WITHs too. I don't really see any way to
deal with this without some sort of semantic changes.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

| Next | Last
Pages: 1 2 3
Prev: [HACKERS] License clarification: BSD vs MIT
Next: Extraneous newlines in logfile from vacuumdb