From: Heikki Linnakangas on
Richard Huxton wrote:
> Can we not wait to cancel the transaction until *any* new lock is
> attempted though? That should protect all the single-statement
> long-running transactions that are already underway. Aggregates etc.

Hmm, that's an interesting thought. You'll still need to somehow tell
the victim backend "you have to fail if you try to acquire any more
locks", but a single per-backend flag in the procarray would suffice.

You could also clear the flag whenever you free the last snapshot in the
transaction (ie. between each query in read committed mode).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Richard Huxton on
Replying to my own post - first sign of madness...

Let's see if I've got the concepts clear here, and hopefully my thinking
it through will help others reading the archives.

There are two queues:
1. Cleanup on the master
2. Replay on the slave

Running write queries on the master adds to both queues.
Running (read-only) queries on the slave prevents you removing from both
queues.


There are two interesting measurements of "age"/"size":
1. Oldest item in / length of queue (knowable)
2. How long will it take to clear the queue (estimable at best)

You'd like to know #2 to keep up with your workload. Unfortunately, you
can't for certain unless you have control over new incoming queries (on
both master and slave).

You might want four separate GUCs for the two measurements on the two
queues. We currently have two that (sort of) match #1 "Oldest item"
(vacuum_defer_cleanup_age, max_standby_delay).


Delaying replay on a slave has no effect on the master. If a slave falls
too far behind it's responsible for catch-up (via normal WAL archives).

There is no point in delaying cleanup on the master unless it's going to
help one or more slaves. In fact, you don't want to start delaying
cleanup until you have to, otherwise you're wasting your delay time.
This seems to be the case with vacuum_defer_cleanup_age. If I have a
heavily-updated table and I defer vacuuming then when any given query
starts on the slave it's going to be half used up already.

There's also no point in deferring cleanup on the master if the standby
is already waiting on a conflict that will cause its queries to be
cancelled anyway. Not only won't it help, but it might make things worse
since transactions will be cancelled, the conflict will be replayed and
(presumably) queries will be re-submitted only to be cancelled again.

This is what Greg Smith's discussion of the keep-alives was about.
Giving the master enough information to be smarter about cleanup (and
making the conflicts more fine-grained).

The situation with deferring on one or both ends of process just gets
more complicated with multiple slaves. There's all sorts of unpleasant
feedback loops I can envisage there.

For the case of single slave being used to run long reporting queries
the ideal scenario would be the following. Master starts deferring
vacuum activity just before the query starts. When that times out, the
slave will receive the cleanup info, refuse to replay it and start its
delay. This gives you a total available query time of:
natural time between vacuums + vacuum delay + WAL transfer time +
standby delay


I can think of five useful things we should be doing (and might be
already - don't know).

1. On the master, deduce whether the slave is already waiting on a
query. If so, don't bother delaying cleanup. Clearly you don't want to
be signalling hundreds of times a second though. Does the slave pause
fetching via streaming replication if replay is blocked on a query?
Could we signal "half-way to max-age" or some such?

2. Perhaps simpler than trying to make the master smarter, just allow
SET this_transaction_is_probably_a_long_one=true on the slave. That (a)
clears the queue on the slave and (b) sends the signal to the master
which then starts deferring vacuum.

3. Do a burst of cleanup activity on the master after blocking. This
should concentrate conflicts together when they reach the slave. Perhaps
vacuum_defer_cleanup_age should be vacuum_deferred_queue_size and
measure the amount of work to do, rather than the max age of the oldest
cleanup (if I've understood correctly).

4. Do a burst of replay on the slave after blocking. Perhaps every time
it cancels a transaction it should replay at least half the queued WAL
before letting new transactions start. Or perhaps it replays any vacuum
activity it comes across and then stops. That should sync with #2
assuming the slave doesn't lag the master too much.

5. I've been mixing "defer" and "delay", as do the docs. We should
probably settle on one or the other. I think defer conveys the meaning
more precisely, but what about non-native English speakers?

--
Richard Huxton
Archonet Ltd

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on
On 2/26/10 6:57 AM, Richard Huxton wrote:
>
> Can we not wait to cancel the transaction until *any* new lock is
> attempted though? That should protect all the single-statement
> long-running transactions that are already underway. Aggregates etc.

I like this approach. Is it fragile in some non-obvious way?

--Josh Berkus

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on
On 2/26/10 10:53 AM, Tom Lane wrote:
> I think that what we are going to have to do before we can ship 9.0
> is rip all of that stuff out and replace it with the sort of closed-loop
> synchronization Greg Smith is pushing. It will probably be several
> months before everyone is forced to accept that, which is why 9.0 is
> not going to ship this year.

I don't think that publishing visibility info back to the master ... and
subsequently burdening the master substantially for each additional
slave ... are what most users want. Certainly for use cases like NTT's,
it is, but not for most of our users.

In fact, I seem to remember specifically discussing the approach of
trying to publish snapshots back to the master, and rejecting it on this
list during the development of SR.

Does anyone know how Oracle solves these issues? Does their structure
(separate rollback log) make it easier for them?

--Josh Berkus

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on

> I don't see a "substantial additional burden" there. What I would
> imagine is needed is that the slave transmits a single number back
> --- its current oldest xmin --- and the walsender process publishes
> that number as its transaction xmin in its PGPROC entry on the master.

If the main purpose of the slave is long-running queries, though, this
could cause a lot of bloat on the master. That's a special case, but a
reason why we would want to preserve the stop replication functionality.

> I don't doubt that this approach will have its own gotchas that we
> find as we get into it. But it looks soluble. I have no faith in
> either the correctness or the usability of the approach currently
> being pursued.

So, why not start working on it now, instead of arguing about it? It'll
be easy to prove the approach once we have some test code.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers