From: Greg Smith on
Josh Berkus wrote:
> First, from the nature of the arguments, we need to eventually have both
> versions of SR: delay-based and xmin-pub. And it would be fantastic if
> Greg Smith and Tom Lane could work on xmin-pub to see if we can get it
> ready as well.
>

As I see it, the main technical obstacle here is that a subset of a
feature already on the SR roadmap needs to get built earlier than
expected to pull this off. I don't know about Tom, but I have no
expectation it's possible for me to get up to speed on that code fast
enough to contribute anything there. I expect the thing I'd be most
productive at as far as moving the release forward is to continue
testing this pair of features looking for rough edges, which is what I
have planned for the next month.

I'm not even close to finished with generating test cases specifically
probing for bad behavior suspected after a look the implementation
details--this is just what I came up with in my first week of that.
Count me in for more testing, but out for significant development here.
It's not what I've got my time allocated for because it's not where I
think I'll be most productive.

> 2) A more usable vacuum_defer_cleanup_age. If it was feasible for a
> user to configure the master to not vacuum records less than, say, 5
> minutes dead, then that would again offer the choice to the user of
> slightly degraded performance on the master (acceptable) vs. lots of
> query cancel (unacceptable). I'm going to test Greg's case with
> vacuum_cleanup_age used fairly liberally to see if this approach has merit.
>

I've been down that road and it leads quickly to the following
question: "how can I tell how old in time-based units an xid is?" If
there were an easy answer to that question, vacuum_defer_cleanup_age
would already be set in time units. It's the obvious UI to want, it's
just not obvious how to build it internally. Maybe I missed something,
but my guess is that vacuum_defer_cleanup_age is already as good as it's
going to get.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on
Greg, Joachim,

> As I see it, the main technical obstacle here is that a subset of a
> feature already on the SR roadmap needs to get built earlier than
> expected to pull this off. I don't know about Tom, but I have no
> expectation it's possible for me to get up to speed on that code fast
> enough to contribute anything there. I expect the thing I'd be most
> productive at as far as moving the release forward is to continue
> testing this pair of features looking for rough edges, which is what I
> have planned for the next month.

That's OK with me. I thought you were saying that xmin-pub was going to
be easier than expected. Per my other e-mails, I think that we should
be shooting for "good enough, on time" for 9.0., rather than "perfect".
We can't ever get to "perfect" if we don't release software.

Quite frankly, simply telling people that "long-running queries on the
slave tend not to be effective, wait for 9.1" is a possibility. If you
consider the limitations and silent failures associated with MySQL
replication, let alone the issues with other Postgres solutions or the
replication of some of the nosql databases, "no long-running queries" is
a positively straightforwards restriction.

HS+SR is still a tremendous improvement over the options available
previously. We never thought it was going to work for everyone
everywhere, and shouldn't let our project's OCD tendencies run away from us.

> I've been down that road and it leads quickly to the following
> question: "how can I tell how old in time-based units an xid is?" If
> there were an easy answer to that question, vacuum_defer_cleanup_age
> would already be set in time units. It's the obvious UI to want, it's
> just not obvious how to build it internally. Maybe I missed something,
> but my guess is that vacuum_defer_cleanup_age is already as good as it's
> going to get.

Well, we could throw this on the user if we could get them some
information on how to calculate that number. For example, some way for
them to calculate the number of XIDs per minute via a query, and then
set vacuum_defer_cleanup_age appropriately on the master. Sure, it's
clunky, but we've already warned people that 9.0 will be clunky and hard
to administer. And it's no worse than setting FSM_pages used to be.

However, first we need to test that setting vacuum_defer_cleanup_age
actually benefits query cancel issues.

> We're not only canceling queries, we are effectively canceling
> transactions. It seems quite impossible to repeat all queries from a
> transaction that has started in the past. One query might be or
> include the result of a previous query and as the data we see now has
> changed since then, the client might now want to execute a different
> query when it gets a different result out of a previous query...

Sure, except that I don't expect people to be using explicit
transactions as much on the slaves, since they are read-only anyway and
can't even create temp tables. So having the retry not retry if there is
an explicit transaction would be an OK option.

> And even if it was possible, how often would you retry? You still have
> no guarantee that your query succeeds the second time. I'd claim that
> if a query failed once, chances are even higher that it fails again
> than that it succeeds the second time. Moreover if you continue to
> repeat the query and if queries come in at a certain rate, you need to
> process more and more queries on the slave which will not really help
> other queries to finish in time nor will it be beneficial for the
> throughput of the system as a whole...

Well, we'd need to have a limited number of retries, which means a GUC
in recovery.conf:

query_cancel_retry = #

This might default to, say, 2.

However, I'd still like to hear from someone with the requisite
technical knowledge whether capturing and retrying the current query in
a query cancel is even possible.

--Josh Berkus

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on
Josh Berkus wrote:
> Well, we could throw this on the user if we could get them some
> information on how to calculate that number. For example, some way for
> them to calculate the number of XIDs per minute via a query, and then
> set vacuum_defer_cleanup_age appropriately on the master. Sure, it's
> clunky, but we've already warned people that 9.0 will be clunky and hard
> to administer. And it's no worse than setting FSM_pages used to be.
>
> However, first we need to test that setting vacuum_defer_cleanup_age
> actually benefits query cancel issues.
>

Proving that setting works as expected is already on my test case grid,
seems fine in my limited testing so far. I've started looking into ways
to monitor XID churn in a way for setting it better. I'll take care of
providing all that in my next test case update. My intent here is to
take the ideas outlined in my "Hot Standby Tradeoffs" blog post and turn
that into a new documentation section making it more clear where the
problem steps are, regardless of what else happens here. And I need
some concrete example of XID burn rate measurement to finish that job.

The main problem with setting vacuum_defer_cleanup_age high isn't
showing it works, it's a pretty simple bit of code. It's when you
recognize that it penalizes all cleanup all the time, whether or not the
standby is actually executing a long-running query or not, that you note
the second level of pain in increasing it. Returning to the idea of
"how is this different from a site already in production?", it may very
well be the case that a site that sets vacuum_defer_cleanup_age high
enough to support off-peak batch reporting cannot tolerate how that will
impact vacuums during their peak time of day. The XID export
implementation sidesteps that issue by only making the vacuum delay
increase when queries that require it are running, turning this back
into a standard "what's the best time of day to run my big reports?"
issue that people understand how to cope with already.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on
On 2/28/10 7:00 PM, Greg Smith wrote:
> The main problem with setting vacuum_defer_cleanup_age high isn't
> showing it works, it's a pretty simple bit of code. It's when you
> recognize that it penalizes all cleanup all the time, whether or not the
> standby is actually executing a long-running query or not, that you note
> the second level of pain in increasing it. Returning to the idea of
> "how is this different from a site already in production?", it may very
> well be the case that a site that sets vacuum_defer_cleanup_age high
> enough to support off-peak batch reporting cannot tolerate how that will
> impact vacuums during their peak time of day. The XID export
> implementation sidesteps that issue by only making the vacuum delay
> increase when queries that require it are running, turning this back
> into a standard "what's the best time of day to run my big reports?"
> issue that people understand how to cope with already.

I don't think that defer_cleanup_age is a long-term solution. But we
need *a* solution which does not involve delaying 9.0.

And I think we can measure bloat in a pgbench test, no? When I get a
chance, I'll run one for a couple hours and see the difference that
cleanup_age makes.

--Josh Berkus

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on
Josh Berkus wrote:
> And I think we can measure bloat in a pgbench test, no? When I get a
> chance, I'll run one for a couple hours and see the difference that
> cleanup_age makes.
>

The test case I attached at the start of this thread runs just the
UPDATE to the tellers table. Running something similar that focuses
just on UPDATEs to the pgbench_accounts table, without the rest of the
steps done by the standard test, is the fastest route to bloat. The
standard test will do it too, just does a lot of extra stuff too that
doesn't impact results (SELECT, INSERT) so it wastes some resources
compared to a targeted bloater script.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers