From: Heikki Linnakangas on
Greg Smith wrote:
> Attached is a tar file with some test case demo scripts that demonstrate
> the worst of the problems here IMHO.

Thanks for that! We've been discussing this for ages, so it's nice to
have a concrete example.

> I don't want to belittle that work because it's been important to make
> HS a useful standalone feature, but I feel like that's all triage rather
> than looking for the most robust fix possible.

Ideally the standby would stash away the old pages or tuples somewhere
so that it can still access them even after replaying the WAL records
that remove them from the main storage. I realize that's not going to
happen any time soon because it's hard to do, but that would really be
the most robust fix possible.

> I don't know how difficult the keepalive feature was expected to be, and
> there's certainly plenty of potential landmines in this whole xid export
> idea.

One such landmine is that the keepalives need to flow from client to
server while the WAL records are flowing from server to client. We'll
have to crack that problem for synchronous replication too, but I think
that alone is a big enough problem to make this 9.1 material.

> How to handle situations where the standby goes away for a while,
> such as a network outage, so that it doesn't block the master from ever
> cleaning up dead tuples is a concern.

Yeah, that's another issue that needs to be dealt with. You'd probably
need some kind of a configurable escape valve in the master, to let it
ignore a standby's snapshot once it gets too old.

> But I do know that the current Hot Standby implementation is going to be
> frustrating to configure correctly for people.

Perhaps others who are not as deep into the code as I am will have a
better view on this, but I seriously don't think that's such a big
issue. I think the max_standby_delay setting is quite intuitive and easy
to explain. Sure, it would better if there was no tradeoff between
killing queries and stalling recovery, but I don't think it'll be that
hard to understand the tradeoff.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Richard Huxton on
On 26/02/10 08:33, Greg Smith wrote:
> There are a number of HS
> tunables that interact with one another, and depending your priorities a
> few ways you can try to optimize the configuration for what I expect to
> be common use cases for this feature.

> I've written a blog entry at
> http://blog.2ndquadrant.com/en/2010/02/tradeoffs-in-hot-standby-deplo.html
> that tries to explain all that background clearly,

It did too. Thanks for the nice summary people can be pointed at.

> I'm not sure what you might be expecting from the above combination, but
> what actually happens is that many of the SELECT statements on the table
> *that isn't even being updated* are canceled. You see this in the logs:

Hmm - this I'd already figured out for myself. It's just occurred to me
that this could well be the case between databases too. Database A gets
vacuumed, B gets its queries kicked off on the standby. Granted lots of
people just have the one main DB, but even so...

> LOG: restored log file "0000000100000000000000A5" from archive
> ERROR: canceling statement due to conflict with recovery
> DETAIL: User query might have needed to see row versions that must be
> removed.
> STATEMENT: SELECT sum(abalance) FROM pgbench_accounts;
>
> Basically, every time a WAL segment appears that wipes out a tuple that
> SELECT expects should still be visible, because the dead row left behind
> by the update has been vacuumed away, the query is canceled. This
> happens all the time the way I've set this up, and I don't feel like
> this is a contrived demo. Having a long-running query on the standby
> while things get updated and then periodically autovacuumed on the
> primary is going to be extremely common in the sorts of production
> systems I expect want HS the most.

I can pretty much everyone wanting HS+SR. Thousands of small DBs running
on VMs for a start. Free mostly-live backup? Got to be a winner.

Dumb non-hacker question: why do we cancel all transactions rather than
just those with "ACCESS SHARE" on the vacuumed table in question? Is it
the simple fact that we don't know what table this particular section of
WAL affects, or is it the complexity of tracking all this info?

> If you're running a system that also is using Streaming Replication,
> there is a much better approach possible.

> "Requires keep-alives with timestamps to be added to sync rep feature"
>
> If those keep-alives flowed in both directions, and included both
> timestamps *and* xid visibility information, the master could easily be
> configured to hold open xid snapshots needed for long running queries on
> the standby when that was necessary.

Presumably meaning we need *another* config setting to prevent excessive
bloat on a heavily updated table on the master.

--
Richard Huxton
Archonet Ltd

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Richard Huxton on
On 26/02/10 14:10, Heikki Linnakangas wrote:
>
> Ideally the standby would stash away the old pages or tuples somewhere
> so that it can still access them even after replaying the WAL records
> that remove them from the main storage. I realize that's not going to
> happen any time soon because it's hard to do, but that would really be
> the most robust fix possible.

Something like snapshotting a filesystem, so updates continue while
you're still looking at a static version.

--
Richard Huxton
Archonet Ltd

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on
Richard Huxton wrote:
> On 26/02/10 08:33, Greg Smith wrote:
>> I'm not sure what you might be expecting from the above combination, but
>> what actually happens is that many of the SELECT statements on the table
>> *that isn't even being updated* are canceled. You see this in the logs:
>
> Hmm - this I'd already figured out for myself. It's just occurred to me
> that this could well be the case between databases too. Database A gets
> vacuumed, B gets its queries kicked off on the standby.

No, it's per-database already. Only queries in the same database are
canceled.

> Dumb non-hacker question: why do we cancel all transactions rather than
> just those with "ACCESS SHARE" on the vacuumed table in question? Is it
> the simple fact that we don't know what table this particular section of
> WAL affects, or is it the complexity of tracking all this info?

The problem is that even if transaction X doesn't have an (access share)
lock on the vacuumed table at the moment, it might take one in the
future. Simon proposed mechanisms for storing the information about
vacuumed tables in shared memory, so that if X takes the lock later on
it will get canceled at that point, but that's 9.1 material.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Richard Huxton on
On 26/02/10 14:45, Heikki Linnakangas wrote:
> Richard Huxton wrote:
>> On 26/02/10 08:33, Greg Smith wrote:
>>> I'm not sure what you might be expecting from the above combination, but
>>> what actually happens is that many of the SELECT statements on the table
>>> *that isn't even being updated* are canceled. You see this in the logs:
>>
>> Hmm - this I'd already figured out for myself. It's just occurred to me
>> that this could well be the case between databases too. Database A gets
>> vacuumed, B gets its queries kicked off on the standby.
>
> No, it's per-database already. Only queries in the same database are
> canceled.

That's a relief.

>> Dumb non-hacker question: why do we cancel all transactions rather than
>> just those with "ACCESS SHARE" on the vacuumed table in question? Is it
>> the simple fact that we don't know what table this particular section of
>> WAL affects, or is it the complexity of tracking all this info?
>
> The problem is that even if transaction X doesn't have an (access share)
> lock on the vacuumed table at the moment, it might take one in the
> future. Simon proposed mechanisms for storing the information about
> vacuumed tables in shared memory, so that if X takes the lock later on
> it will get canceled at that point, but that's 9.1 material.

I see - we'd need to age the list of vacuumed tables too, so when the
oldest transactions complete the correct flags get cleared.

Can we not wait to cancel the transaction until *any* new lock is
attempted though? That should protect all the single-statement
long-running transactions that are already underway. Aggregates etc.

--
Richard Huxton
Archonet Ltd

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers