Hot Standby query cancellation and Streaming Replicationintegration [PgSql]

Prev: Hot Standby query cancellation and Streaming Replication integration
Next: ProcSignalSlot vs. PGPROC

From: Josh Berkus on 2 Mar 2010 14:03

On 3/2/10 10:30 AM, Bruce Momjian wrote:
> Right now you can't choose "master bloat", but you can choose the other
> two. I think that is acceptable for 9.0, assuming the other two don't
> have the problems that Tom foresees.

Actually, if vacuum_defer_cleanup_age can be used, "master bloat" is an
option. Hopefully I'll get some time for serious testing this weekend.

--Josh Berkus

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on 2 Mar 2010 15:11

Bruce Momjian wrote:
>> Right now you can't choose "master bloat", but you can choose the other
>> two. I think that is acceptable for 9.0, assuming the other two don't
>> have the problems that Tom foresees.
>>
>
> I was wrong. You can choose "master bloat" with
> vacuum_defer_cleanup_age, but only crudely because it is measured in
> xids and the master defers no matter what queries are running on the
> slave...

OK with you finding the situation acceptable, so long as it's an
informed decision. From how you're writing about this, I'm comfortable
you (and everybody else still involved here) have absorbed the issues
enough that we're all talking about the same thing now. Since there are
a couple of ugly user-space hacks possible for prioritizing "master
bloat", and nobody is stepping up to work on resolving this via my
suggestion involving better SR integration, seems to me heated
discussion of code changes has come to a resolution of sorts I (and
Simon, just checked) can live with. Sounds like we have three action
paths here:

-Tom already said he was planning a tour through the HS/SR code, I
wanted that to happen with him aware of this issue.
-Josh will continue doing his testing, also better informed about this
particular soft spot.
-I'll continue test-case construction for the problems here there are
still concerns about (pathologic max_standby_delay and b-tree split
issues being the top two on that list), and keep sharing particularly
interesting ones here to help everyone else's testing.

If it turns out any of those paths leads to a must-fix problem that
doesn't have an acceptable solution, at least the idea of this as a
"plan B" is both documented and more widely understood then when I
started ringing this particular bell.

I just updated the Open Items list:
http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items to officially
put myself on the hook for the following HS related documentation items
that have come up recently, aiming to get them all wrapped up in time
before or during early beta:

-Update Hot Standby documentation: clearly explain relationships between
the 3 major setup trade-offs, "buffer cleanup lock", notes on which
queries are killed once max_standby_delay is reached, measuring XID
churn on master for setting vacuum_defer_cleanup_age
-Clean up archive_command docs related to recent "/bin/true" addition.
Given that's where I expect people who run into the pg_stop_backup
warning message recently added will end up at, noting its value for
escaping from that particular case might be useful too.

To finish airing my personal 9.0 TODO list now that I've gone this far,
I'm also still working on completing the following patches that initial
versions have been submitted of, was close to finishing both before
getting side-tracked onto this larger issue:

-pgbench > 4000 scale bug fix:
http://archives.postgresql.org/message-id/4B621BA3.7090306(a)2ndquadrant.com
-Improving the logging/error reporting/no timestamp issues in pg_standby
re-raised recently by Selena:
http://archives.postgresql.org/message-id/2b5e566d1001250945oae17be8n6317f827e3bd7492(a)mail.gmail.com

If nobody else claims them as something they're working on before, I
suspect I'll then move onto building some of the archiver UI
improvements discussed most recently as part of the "pg_stop_backup does
not complete" thread, despite Heikki having crushed my dreams of a
simple solution to those by pointing out the shared memory memory
limitation involved.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.us

From: Josh Berkus on 2 Mar 2010 19:22

On 3/2/10 12:47 PM, Marc Munro wrote:
> To take it further still, if vacuum on the master could be prevented
> from touching records that are less than max_standby_delay seconds old,
> it would be safe to apply WAL from the very latest vacuum. I guess HOT
> could be handled similarly though that may eliminate much of the
> advantage of HOT updates.

Aside from the inability to convert between transcation count and time,
isn't this what vacuum_defer_cleanup_age is supposed to do? Or does it
not help with HOT updates?

--Josh Berkus

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on 10 Mar 2010 01:29

All,

I've been playing with vacuum_defer_cleanup_age in reference to the
query cancel problem. It really seems to me that this is the way
forward in terms of dealing with query cancel for normal operation
rather than wal_standby_delay, or maybe in combination with it.

As a first test, I set up a deliberately pathological situation with
pgbench and a wal_standby_delay of 1 second. This allowed me to trigger
query cancel on a relatively simple reporting query; in fact, to make it
impossible to complete.

Then I increased vacuum_defer_cleanup_age to 100000, which represents
about 5 minutes of transactions on the test system. This eliminated all
query cancels for the reporting query, which takes an average of 10s.

Next is a database bloat test, but I'll need to do that on a system with
more free space than my laptop.

--Josh Berkus

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on 10 Mar 2010 13:59

On 3/10/10 3:38 AM, Greg Stark wrote:
> I think that means that a
> vacuum_defer_cleanup of up to about 100 or so (it depends on the width
> of your counter record) might be reasonable as a general suggestion
> but anything higher will depend on understanding the specific system.

100 wouldn't be useful at all. It would increase bloat without doing
anything about query cancel except on a very lightly used system.

> With vacuum_defer_cleanup that will no longer be true.
> It will be as if you always have a query lasting n transactions in
> your system at all times.

Yep, but until we get XID-publish-to-master working in 9.1, I think it's
probably the best we can do. At least it's no *worse* than having a
long-running query on the master at all times.

--Josh Berkus

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: Hot Standby query cancellation and Streaming Replication integration
Next: ProcSignalSlot vs. PGPROC