From: Greg Stark on
On Fri, Feb 26, 2010 at 7:16 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> I don't see a "substantial additional burden" there.  What I would
> imagine is needed is that the slave transmits a single number back
> --- its current oldest xmin --- and the walsender process publishes
> that number as its transaction xmin in its PGPROC entry on the master.

And when we want to support cascading slaves?

Or when you want to bring up a new slave and it suddenly starts
advertising a new xmin that's older than the current oldestxmin?

But in any case if I were running a reporting database I would want it
to just stop replaying logs for a few hours while my big batch report
runs, not cause the master to be unable to vacuum any dead records for
hours. That defeats much of the purpose of running the queries on the
slave.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> writes:
> I don't actually understand how tight synchronization on its own would
> solve the problem. What if the connection to the master is lost? Do you
> kill all queries in the standby before reconnecting?

Sure. So what? They'd have been killed if they individually lost
connections to the master (or the slave), too.

> [ assorted analysis based on WAL contents ]

The problem is all the interactions that are not reflected (historically
anyway) to WAL. We already know about btree page reclamation interlocks
and relcache init files. How many others are there, and how messy and
expensive is it going to be to deal with them?

> If you really think the current approach is unworkable, I'd suggest that
> we fall back to a stop-and-go system, where you either let the recovery
> to progress or allow queries to run, but not both at the same time. But
> FWIW I don't think the situation is that grave.

I might be wrong. I hope for the sake of the project schedule that I am
wrong. But I'm afraid that we will spend several months beavering away
to try to make the current approach solid and user-friendly, and
eventually conclude that it's a dead end. It would be prudent to have
a Plan B; and it looks to me like closed-loop synchronization is the
best Plan B. Putting off all thought about it for the next release
cycle seems like a recipe for a scheduling disaster.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Greg Stark <gsstark(a)mit.edu> writes:
> On Fri, Feb 26, 2010 at 7:16 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> I don't see a "substantial additional burden" there. �What I would
>> imagine is needed is that the slave transmits a single number back
>> --- its current oldest xmin --- and the walsender process publishes
>> that number as its transaction xmin in its PGPROC entry on the master.

> And when we want to support cascading slaves?

So? Fits right in. The walsender on the first-level slave is
advertising an xmin from the second-level one, which will be included in
what's passed back up to the master.

> Or when you want to bring up a new slave and it suddenly starts
> advertising a new xmin that's older than the current oldestxmin?

How's it going to do that, when it has no queries at the instant
of startup?

> But in any case if I were running a reporting database I would want it
> to just stop replaying logs for a few hours while my big batch report
> runs, not cause the master to be unable to vacuum any dead records for
> hours. That defeats much of the purpose of running the queries on the
> slave.

Well, as Heikki said, a stop-and-go WAL management approach could deal
with that use-case. What I'm concerned about here is the complexity,
reliability, maintainability of trying to interlock WAL application with
slave queries in any sort of fine-grained fashion.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on
On Fri, Feb 26, 2010 at 8:30 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> How's it going to do that, when it has no queries at the instant
> of startup?
>

Why shouldn't it have any queries at walreceiver startup? It has any
xlog segments that were copied from the master and any it can find in
the archive, it could easily reach a consistent point long before it
needs to connect to the master. If you really want to protect your
master from any additional overhead you don't currently need to
configure a streaming connection at all, you can just use the file
shipping interface.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Greg Stark <gsstark(a)mit.edu> writes:
> Why shouldn't it have any queries at walreceiver startup? It has any
> xlog segments that were copied from the master and any it can find in
> the archive, it could easily reach a consistent point long before it
> needs to connect to the master. If you really want to protect your
> master from any additional overhead you don't currently need to
> configure a streaming connection at all, you can just use the file
> shipping interface.

There's *definitely* not going to be enough information in the WAL
stream coming from a master that doesn't think it has HS slaves.
We can't afford to record all that extra stuff in installations for
which it's just useless overhead. BTW, has anyone made any attempt
to measure the performance hit that the patch in its current form is
creating via added WAL entries?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers