Listen / Notify - what to do when the queue is full [PgSql]

Prev: tsvector extraction patch
Next: Streaming replication and triggering failover

From: Merlin Moncure on 11 Jan 2010 08:51

On Mon, Jan 11, 2010 at 8:25 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>
>> I'm working on a proof of concept patch to use Joachim's new notify
>> function to introduce case 3. I think this means going through the
>> planner and executor, so I might as well do case 4 as well.
>
> It would be a lot less work to introduce a function like send_notify()
> that could be invoked within a regular SELECT. Pushing a utility

+1

IMO, this neatly solves the problem and addresses some of the concerns
I was raising upthread. A bytea version would be nice but a text only
version is workable.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Arnaud Betremieux on 11 Jan 2010 09:08

On 11/01/2010 14:25, Tom Lane wrote:
> Arnaud Betremieux<arnaud.betremieux(a)keyconsulting.fr> writes:
>
>> 3) My use case : NOTIFY channel 'pay'||'load' (actually NOTIFY
>> channel '<table_name>#'||OLD.id)
>> 4) Taken one step further : NOTIFY channel (SELECT payload FROM payloads
>> WHERE ...)
>>
>
>> I'm working on a proof of concept patch to use Joachim's new notify
>> function to introduce case 3. I think this means going through the
>> planner and executor, so I might as well do case 4 as well.
>>
> It would be a lot less work to introduce a function like send_notify()
> that could be invoked within a regular SELECT. Pushing a utility
> statement through the planner/executor code path will do enough violence
> to the system design that such a patch would probably be rejected out of
> hand.
>
Introducing a send_notify function does sound a lot simpler and cleaner,
and I think I'll try it this way. The only thing that bothers me is the
syntax :

.... DO ALSO SELECT send_notify('payload')
.... DO ALSO SELECT send_notify(a) FROM b

How about a new grammar for NOTIFY <channel> a_expr, which would go
through the rewriter to be transformed as a SELECT ?
so NOTIFY (SELECT a FROM b) would become SELECT send_notify(SELECT a
FROM b) ?

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jeff Davis on 19 Jan 2010 02:08

Initial comments:

* compiler warnings

ipci.c: In function 'CreateSharedMemoryAndSemaphores':
ipci.c:228: warning: implicit declaration of function 'AsyncShmemInit'

* 2PC

Adds complexity, and I didn't see any clear, easy solution after
reading the thread. I don't see this as a showstopper, so I'd leave
this until later.

* Hot Standby

It would be nice to have NOTIFY work with HS, but I don't think that's
going to happen this cycle. I don't think this is a showstopper,
either.

* ASCII?

I tend to think that encoding should be handled properly here, or we
should use BYTEA. My reasoning is that TEXT is a datatype, whereas ASCII
is not, and I don't see a reason to invent it now. We might as well use
real TEXT (with encoding support) or use BYTEA which works fine for
ASCII anyway.

* send_notify()

Someone suggested a function like this, which sounds useful to me. Not
a showstopper, but if it's not too difficult it might be worth
including. (Incidentally, the function signature should match the type
of the payload, which is unclear if we're going to use ASCII.)

* AsyncCommitOrderLock

This is a big red flag to me. The name by itself is scary. The fact
that it is held across a series of fairly interesting operations is
even scarier.

As you say in the comments, it's acquired before recording the
transaction commit in the clog, and released afterward. What you
actually have is something like:

AtCommit_NotifyBeforeCommit();

HOLD_INTERRUPTS();

s->state = TRANS_COMMIT;

latestXid = RecordTransactionCommit(false);

TRACE_POSTGRESQL_TRANSACTION_COMMIT(MyProc->lxid);

ProcArrayEndTransaction(MyProc, latestXid);

CallXactCallbacks(XACT_EVENT_COMMIT);

ResourceOwnerRelease(TopTransactionResourceOwner,
RESOURCE_RELEASE_BEFORE_LOCKS,
true, true);

AtEOXact_Buffers(true);

AtEOXact_RelationCache(true);

AtEarlyCommit_Snapshot();

AtEOXact_Inval(true);

smgrDoPendingDeletes(true);

AtEOXact_MultiXact();

AtCommit_NotifyAfterCommit();

That is a lot of stuff happening between the acquisition and
release. There are two things particularly scary about this (to me):

* holding an LWLock while performing I/O (in smgrDoPendingDeletes())

* holding an LWLock while acquiring another LWLock (e.g.
ProcArrayEndTransaction())

An LWLock-based deadlock is a hard deadlock -- not detected by the
deadlock detector, and there's not much you can do even if it were --
right in the transaction-completing code. That means that the whole
system would be locked up. I'm not sure that such a deadlock condition
exists, but it seems likely (if not, it's only because other areas of
the code avoided this practice), and hard to prove that it's safe. And
it's probably bad for performance to increase the length of time
transactions are serialized, even if only for cases that involve
LISTEN/NOTIFY.

I believe this needs a re-think. What is the real purpose for
AsyncCommitOrderLock, and can we acheive that another way? It seems
that you're worried about a transaction that issues a LISTEN and
committing not getting a notification from a NOTIFYing transaction
that commits concurrently (and slightly after the LISTEN). But
SignalBackends() is called after transaction commit, and should signal
all backends who committed a LISTEN before that time, right?

* The transaction IDs are used because Send_Notify() is called before
the AsyncCommitOrderLock acquire, and so the backend could potentially
be reading uncommitted notifications that are "about" to be committed
(or aborted). Then, the queue is not read further until that transaction
completes. That's not really commented effectively, and I suspect the
process could be simpler. For instance, why can't the backend always
read all of the data from the queue, notifying if the transaction is
committed and saving to a local list otherwise (which would be checked
on the next wakeup)?

* Can you clarify the big comment at the top of async.c? It's a helpful
overview, but it glosses over some of the touchy synchronization steps
going on. I find myself trying to make a map of these steps myself.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Joachim Wieland on 19 Jan 2010 04:58

Hi Jeff,

thanks a lot for your review. I will reply to your review again in
detail but I'd like to answer your two main questions already now.

On Tue, Jan 19, 2010 at 8:08 AM, Jeff Davis <pgsql(a)j-davis.com> wrote:
> * AsyncCommitOrderLock
>
> I believe this needs a re-think. What is the real purpose for
> AsyncCommitOrderLock, and can we acheive that another way? It seems
> that you're worried about a transaction that issues a LISTEN and
> committing not getting a notification from a NOTIFYing transaction
> that commits concurrently (and slightly after the LISTEN).

Yes, that is exactly the point. However I am not worried about a
notification getting lost but rather about determining the visibility
of the notifications.

In the end we need to be able to know about the order of LISTEN,
UNLISTEN and NOTIFY commits to find out who should receive which
notifications. As you cannot determine if xid1 has committed before or
after xid2 retrospectively I enforced the order by an LWLock and by
saving the list of xids currently being committed.

There are also two examples in
http://archives.postgresql.org/pgsql-hackers/2009-12/msg00790.php
about that issue.

> But SignalBackends() is called after transaction commit, and should signal
> all backends who committed a LISTEN before that time, right?

Yes, any listening backend is being signaled but that doesn't help to
find out about the exact order of the almost-concurrent events that
happened before.

> * The transaction IDs are used because Send_Notify() is called before
> the AsyncCommitOrderLock acquire, and so the backend could potentially
> be reading uncommitted notifications that are "about" to be committed
> (or aborted). Then, the queue is not read further until that transaction
> completes. That's not really commented effectively, and I suspect the
> process could be simpler. For instance, why can't the backend always
> read all of the data from the queue, notifying if the transaction is
> committed and saving to a local list otherwise (which would be checked
> on the next wakeup)?

It's true that the backends could always read up to the end of the
queue and copy everything into the local memory. However you still
need to apply the same checks before you deliver the notifications:
You need to make sure that the transaction has committed and that you
were listening to the channels of the notifications at the time they
got sent / committed. Also you need to copy really _everything_
because you could start to listen to a channel after copying its
uncommitted notifications.

There are other reasons (tail pointer management and signaling
strategy) but in the end it seemed more straightforward to stop as
soon as we hit an uncommitted notification. We will receive a signal
for it eventually anyway and can then start again and read further.

Also I think (but I have no numbers about it) that it makes the
backends work more on the same slru pages.

Joachim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Arnaud Betremieux on 19 Jan 2010 05:54

Regarding the send_notify function, I have been working on it and have a
patch (attached) that applies on top of Joachim's.

It introduces a send_notify SQL function that calls the Async_Notify C
function.

It's pretty straightforward and will need to be refined to take into
account the encoding decisions that are made in Async_Notify, but it
should be a good starting point, and it doesn't introduce much in the
way of complexity.

On a side note, since that might be a discussion for another thread, it
would be nice and more DBA friendly, to have some kind of syntactic
sugar around it to be able to use
NOTIFY channel (SELECT col FROM table);
instead of
SELECT send_notify('channel', (SELECT col FROM table));

Best regards,
Arnaud Betremieux

On 19/01/2010 08:08, Jeff Davis wrote:
> Initial comments:
>
> * compiler warnings
>
> ipci.c: In function 'CreateSharedMemoryAndSemaphores':
> ipci.c:228: warning: implicit declaration of function 'AsyncShmemInit'
>
> * 2PC
>
> Adds complexity, and I didn't see any clear, easy solution after
> reading the thread. I don't see this as a showstopper, so I'd leave
> this until later.
>
> * Hot Standby
>
> It would be nice to have NOTIFY work with HS, but I don't think that's
> going to happen this cycle. I don't think this is a showstopper,
> either.
>
> * ASCII?
>
> I tend to think that encoding should be handled properly here, or we
> should use BYTEA. My reasoning is that TEXT is a datatype, whereas ASCII
> is not, and I don't see a reason to invent it now. We might as well use
> real TEXT (with encoding support) or use BYTEA which works fine for
> ASCII anyway.
>
> * send_notify()
>
> Someone suggested a function like this, which sounds useful to me. Not
> a showstopper, but if it's not too difficult it might be worth
> including. (Incidentally, the function signature should match the type
> of the payload, which is unclear if we're going to use ASCII.)
>
> * AsyncCommitOrderLock
>
> This is a big red flag to me. The name by itself is scary. The fact
> that it is held across a series of fairly interesting operations is
> even scarier.
>
> As you say in the comments, it's acquired before recording the
> transaction commit in the clog, and released afterward. What you
> actually have is something like:
>
> AtCommit_NotifyBeforeCommit();
>
> HOLD_INTERRUPTS();
>
> s->state = TRANS_COMMIT;
>
> latestXid = RecordTransactionCommit(false);
>
> TRACE_POSTGRESQL_TRANSACTION_COMMIT(MyProc->lxid);
>
> ProcArrayEndTransaction(MyProc, latestXid);
>
> CallXactCallbacks(XACT_EVENT_COMMIT);
>
> ResourceOwnerRelease(TopTransactionResourceOwner,
> RESOURCE_RELEASE_BEFORE_LOCKS,
> true, true);
>
> AtEOXact_Buffers(true);
>
> AtEOXact_RelationCache(true);
>
> AtEarlyCommit_Snapshot();
>
> AtEOXact_Inval(true);
>
> smgrDoPendingDeletes(true);
>
> AtEOXact_MultiXact();
>
> AtCommit_NotifyAfterCommit();
>
> That is a lot of stuff happening between the acquisition and
> release. There are two things particularly scary about this (to me):
>
> * holding an LWLock while performing I/O (in smgrDoPendingDeletes())
>
> * holding an LWLock while acquiring another LWLock (e.g.
> ProcArrayEndTransaction())
>
> An LWLock-based deadlock is a hard deadlock -- not detected by the
> deadlock detector, and there's not much you can do even if it were --
> right in the transaction-completing code. That means that the whole
> system would be locked up. I'm not sure that such a deadlock condition
> exists, but it seems likely (if not, it's only because other areas of
> the code avoided this practice), and hard to prove that it's safe. And
> it's probably bad for performance to increase the length of time
> transactions are serialized, even if only for cases that involve
> LISTEN/NOTIFY.
>
> I believe this needs a re-think. What is the real purpose for
> AsyncCommitOrderLock, and can we acheive that another way? It seems
> that you're worried about a transaction that issues a LISTEN and
> committing not getting a notification from a NOTIFYing transaction
> that commits concurrently (and slightly after the LISTEN). But
> SignalBackends() is called after transaction commit, and should signal
> all backends who committed a LISTEN before that time, right?
>
> * The transaction IDs are used because Send_Notify() is called before
> the AsyncCommitOrderLock acquire, and so the backend could potentially
> be reading uncommitted notifications that are "about" to be committed
> (or aborted). Then, the queue is not read further until that transaction
> completes. That's not really commented effectively, and I suspect the
> process could be simpler. For instance, why can't the backend always
> read all of the data from the queue, notifying if the transaction is
> committed and saving to a local list otherwise (which would be checked
> on the next wakeup)?
>
> * Can you clarify the big comment at the top of async.c? It's a helpful
> overview, but it glosses over some of the touchy synchronization steps
> going on. I find myself trying to make a map of these steps myself.
>
> Regards,
> Jeff Davis
>
>
>

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Prev: tsvector extraction patch
Next: Streaming replication and triggering failover