Hot Standby query cancellation and Streaming Replication integration [PgSql]

Prev: Why isn't stats_temp_directory automatically created?
Next: Hot Standby query cancellation and Streaming Replicationintegration

From: Greg Stark on 26 Feb 2010 16:22

On Fri, Feb 26, 2010 at 9:19 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> There's *definitely* not going to be enough information in the WAL
> stream coming from a master that doesn't think it has HS slaves.
> We can't afford to record all that extra stuff in installations for
> which it's just useless overhead. BTW, has anyone made any attempt
> to measure the performance hit that the patch in its current form is
> creating via added WAL entries?

What extra entries?

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Dimitri Fontaine on 26 Feb 2010 16:39

Tom Lane <tgl(a)sss.pgh.pa.us> writes:
> Well, as Heikki said, a stop-and-go WAL management approach could deal
> with that use-case. What I'm concerned about here is the complexity,
> reliability, maintainability of trying to interlock WAL application with
> slave queries in any sort of fine-grained fashion.

Some admin functions for Hot Standby were removed from the path to ease
its integration, there was a pause() and resume() feature.

I think that offering this explicit control to the user would allow them
to choose between HA setup and reporting setup easily enough: just pause
the replay when running the reporting, resume it to get fresh data
again. If you don't pause, any query can get killed, replay is the
priority.

Now as far as the feedback loop is concerned, I guess the pause()
function would cause the slave to stop publishing any xmin in the
master's procarray so that it's free to vacuum and archive whatever it
wants to.

Should the slave accumulate too much lag, it will resume from the
archive rather than live from the SR link.

How much that helps?

Regards,
--
dim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 26 Feb 2010 16:44

Greg Stark <gsstark(a)mit.edu> writes:
> On Fri, Feb 26, 2010 at 9:19 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> There's *definitely* not going to be enough information in the WAL
>> stream coming from a master that doesn't think it has HS slaves.
>> We can't afford to record all that extra stuff in installations for
>> which it's just useless overhead. �BTW, has anyone made any attempt
>> to measure the performance hit that the patch in its current form is
>> creating via added WAL entries?

> What extra entries?

Locks, just for starters. I haven't read enough of the code yet to know
what else Simon added. In the past it's not been necessary to record
any transient information in WAL, but now we'll have to.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Dimitri Fontaine on 26 Feb 2010 17:11

Bruce Momjian <bruce(a)momjian.us> writes:
> Doesn't the system already adjust the delay based on the length of slave
> transactions, e.g. max_standby_delay. It seems there is no need for a
> user switch --- just max_standby_delay really high.

Well that GUC looks like it allows to set a compromise between HA and
reporting, not to say "do not ever give the priority to the replay while
I'm running my reports". At least that's how I understand it.

The feedback loop might get expensive on master server when running
reporting queries on the slave, unless you can "pause" it explicitly I
think. I don't see how the system will guess that you're running a
reporting server rather than a HA node, and max_standby_delay is just a
way to tell the standby to please be nice in case of abuse.

Regards,
--
dim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on 26 Feb 2010 19:43

On Fri, Feb 26, 2010 at 11:56 PM, Greg Smith <greg(a)2ndquadrant.com> wrote:
> This is also the reason why the whole "pause recovery" idea is a fruitless
> path to wander down. The whole point of this feature is that people have a
> secondary server available for high-availability, *first and foremost*, but
> they'd like it to do something more interesting that leave it idle all the
> time. The idea that you can hold off on applying standby updates for long
> enough to run seriously long reports is completely at odds with the idea of
> high-availability.

Well you can go sit in the same corner as Simon with your high
availability servers.

I want my ability to run large batch queries without any performance
or reliability impact on the primary server.

You can have one or the other but you can't get both. If you set
max_standby_delay low then you get your high availability server, if
you set it high you get a useful report server.

If you build sync replication which we don't have today and which will
open another huge can of usability worms when we haven't even finish
bottling the two we've already opened then you lose the lack of impact
on the primary. Suddenly the queries you run on the slaves cause your
production database to bloat. Plus you have extra network connections
which take resources on your master and have to be kept up at all
times or you lose your slaves.

I think the design constraint of not allowing any upstream data flow
is actually very valuable. Eventually we'll have it for sync
replication but it's much better that we've built things incrementally
and can be sure that nothing really depends on it for basic
functionality. This is what allows us to know that the slave imposes
no reliability impact on the master. It's what allows us to know that
everything will work identically regardless of whether you have a
walreceiver running or are running off archived log files.

Remember I wanted to entirely abstract away the walreciever and allow
multiple wal communication methods. I think it would make more sense
to use something like Spread to distribute the logs so the master only
has to send them once and as many slaves as you want can pick them up.
The current architecture doesn't scale very well if you want to have
hundreds of slaves for one master.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: Why isn't stats_temp_directory automatically created?
Next: Hot Standby query cancellation and Streaming Replicationintegration