From: Dimitri Fontaine on
Josh Berkus <josh(a)agliodbs.com> writes:
> 2) A more usable vacuum_defer_cleanup_age. If it was feasible for a
> user to configure the master to not vacuum records less than, say, 5
> minutes dead, then that would again offer the choice to the user of
> slightly degraded performance on the master (acceptable) vs. lots of
> query cancel (unacceptable). I'm going to test Greg's case with
> vacuum_cleanup_age used fairly liberally to see if this approach has
> merit.

I think that to associate any time based interval notion with the XID
flow, you need a ticker. We already took the txid and txid_snapshot
types and functions from Skytools, which took them from Slony.

Maybe we could consider borrowing pgqd, the C version of the ticker, for
being able to specify in human time how long a dead transaction is
allowed to remain in the heap?

http://github.com/markokr/skytools-dev/tree/master/sql/ticker/

Regards,
--
dim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Joachim Wieland on
On Sun, Feb 28, 2010 at 8:47 PM, Josh Berkus <josh(a)agliodbs.com> wrote:
> 1) Automated retry of cancelled queries on the slave.  I have no idea
> how hard this would be to implement, but it makes the difference between
> writing lots of exception-handling code for slave connections
> (unacceptable) to just slow response times on the slave (acceptable).

We're not only canceling queries, we are effectively canceling
transactions. It seems quite impossible to repeat all queries from a
transaction that has started in the past. One query might be or
include the result of a previous query and as the data we see now has
changed since then, the client might now want to execute a different
query when it gets a different result out of a previous query...

And even if it was possible, how often would you retry? You still have
no guarantee that your query succeeds the second time. I'd claim that
if a query failed once, chances are even higher that it fails again
than that it succeeds the second time. Moreover if you continue to
repeat the query and if queries come in at a certain rate, you need to
process more and more queries on the slave which will not really help
other queries to finish in time nor will it be beneficial for the
throughput of the system as a whole...


I fully agree with what you say about user expectations: We need to
assume that many programs are not prepared for failures of "simple"
read-only queries because in the past they have always worked...


> Another thing to keep in mind in these discussions is the
> inexpensiveness of servers today. This means that, if slaves have poor
> performance, that's OK; one can always spin up more slaves. But if each
> slave imposes a large burden on the master, then that limits your
> scalability.

The burden of the xmin-publication feature is not the number of
slaves, it's just the longest running queries on whatever slave they
are. So your argument applies to both cases... To minimize the burden
on the master, get additional slaves so that you can run your most
expensive queries in a shorter time :-)


Joachim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Sun, Feb 28, 2010 at 5:38 PM, Josh Berkus <josh(a)agliodbs.com> wrote:
> Greg, Joachim,
>
>> As I see it, the main technical obstacle here is that a subset of a
>> feature already on the SR roadmap needs to get built earlier than
>> expected to pull this off.  I don't know about Tom, but I have no
>> expectation it's possible for me to get up to speed on that code fast
>> enough to contribute anything there.  I expect the thing I'd be most
>> productive at as far as moving the release forward is to continue
>> testing this pair of features looking for rough edges, which is what I
>> have planned for the next month.
>
> That's OK with me.  I thought you were saying that xmin-pub was going to
> be easier than expected.  Per my other e-mails, I think that we should
> be shooting for "good enough, on time" for 9.0., rather than "perfect".
>  We can't ever get to "perfect" if we don't release software.

I agree. It seems to me that the right long term fix for the problem
of query cancellations on the slave is going to be to give the slave
the ability to save multiple versions of relation pages where
necessary so that older snapshots can continue to be used even after
the conflicting WAL has been applied. However, I'm pretty sure that's
going to be a very difficult project which is unlikely to be coded by
anyone any time soon, let alone merged. Until it does, we're going to
force people to pick from a fairly unappealing menu of options:
postpone WAL replay for long periods of time, cancel queries (perhaps
even seemingly unrelated to what changed on the master), bloat the
master. All of those options are seriously unpleasant.

I think, though, that we have to think of this as being like the
Windows port, or maybe even more significant than that, as an
architectural change. I think it is going to take several releases
for this feature to be well-understood and stable and have all the
options we'd like it to have. It wouldn't surprise me if we get to
10.0 before we really have truly seamless replication. I don't expect
Slony or Londiste or any of the other solutions that are out there now
to get kicked to the curb by PG 9.0. Still, a journey of a thousand
miles begins with the first step. Simon and many others have put a
great deal of time and energy into getting us to the point where we
are now, and if we let the fact that we haven't reached our ultimate
goal keep us from putting what we have out there in front of our
customers, I think we're going to regret that.

I think the thing to do is to reposition our PR around these features.
We should maybe even go so far as to call them "beta" or
"experimental". We shouldn't tell people - this is going to be
totally awesome. We should tell people - this is a big improvement,
and it's still got some pretty significant limitations, but it's good
stuff and it's going in a good direction. Overhyping what we have
today is not going to be good for the project, and I'm frankly quite
afraid that nothing we can possibly code between now and the release
is going to measure up to what people are hoping for. We need to set
our own expectations, and those of our customers, at a level at which
they can be met.

> Quite frankly, simply telling people that "long-running queries on the
> slave tend not to be effective, wait for 9.1" is a possibility.

Yep.

> HS+SR is still a tremendous improvement over the options available
> previously.  We never thought it was going to work for everyone
> everywhere, and shouldn't let our project's OCD tendencies run away from us.

Yep.

> However, I'd still like to hear from someone with the requisite
> technical knowledge whether capturing and retrying the current query in
> a query cancel is even possible.

I'm not sure who you want to hear from here, but I think that's a dead end.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on
On Mon, Mar 1, 2010 at 5:50 PM, Josh Berkus <josh(a)agliodbs.com> wrote:
> I don't think that defer_cleanup_age is a long-term solution.  But we
> need *a* solution which does not involve delaying 9.0.

So I think the primary solution currently is to raise max_standby_age.

However there is a concern with max_standby_age. If you set it to,
say, 300s. Then run a 300s query on the slave which causes the slave
to fall 299s behind. Now you start a new query on the slave -- it gets
a snapshot based on the point in time that the slave is currently at.
If it hits a conflict it will only have 1s to finish before the
conflict causes the query to be cancelled.

In short in the current setup I think there is no safe value of
max_standby_age which will prevent query cancellations short of -1. If
the slave has a constant stream of queries and always has at least one
concurrent query running then it's possible that the slave will run
continuously max_standby_age-epsilon behind the master and cancel
queries left and right, regardless of how large max_standby_age is.

To resolve this I think you would have to introduce some chance for
the slave to catch up. Something like refusing to use a snapshot older
than max_standby_age/2 and instead wait until the existing queries
finish and the slave gets a chance to catch up and see a more recent
snapshot. The problem is that this would result in very unpredictable
and variable response times from the slave. A single long-lived query
could cause replay to pause for a big chunk of max_standby_age and
prevent any new query from starting.

Does anyone see any way to guarantee that the slave gets a chance to
replay and new snapshots will become visible without freezing out new
queries for extended periods of time?

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on
On Mon, Mar 1, 2010 at 7:21 PM, Josh Berkus <josh(a)agliodbs.com> wrote:
> Completely aside from that, how many users are going to be happy with a
> slave server which is constantly 5 minutes behind?
>

Uhm, well all the ones who are happy with our current warm standby
setup for one?

And all the ones who are looking for a standby reporting server rather
than a high availability DR site.

For what it's worth Oracle has an option to have your standby
intentionally hold back n minutes behind and I've seen that set to 5
minutes.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers