Prev: pg_trgm
Next: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby
From: Simon Riggs on 27 May 2010 09:08 Following design offers simplicity of design, performance and user control over sync rep waits, including wait-for-apply for HS. This implements Oracle's Maximum Availability option AND Maximum Performance options both together, rather than just one or the other: async and sync replication together, under user control. * BACKEND: In xact.c: Immediately after fsync during commit logic if (sync_rep != NONE) { max_wakeup_time = commit_timestamp + sync_rep_timeout; SetAlarm(max_wakeup_time); // similar to statement timeout WaitOnQueue(commitLSN); DisableAlarm(); } In proc.c: in signal handler code if (wakeup && waiting_on_commit) RemoveFromQueue() * New process: WALSync (on primary) Receives messages from WALAck on standby and wakes up queued backends that have reached the requested commitLSN. If there are multiple WALSync processes they all try to remove backends from the head of the queue. Process started in same way as WALSender, when request arrives from standby. (WaitOnQueue() returns immediately if no WALSync are started, since that means no sync rep yet available) * New process: WALAck (on standby) Reads shared memory to get last received and last applied xlog location and sends message to WALSync on primary. Loop/Sleep forever. Values in shared mem already put there by WALReceiver and Startup processes. Reuse message protocol as for WALSender->WALReceiver. Process started after WALReceiver connects, if additional option in recovery.conf. Initiates second connection to primary, issues slightly different startup command to create WALSync. That's it. The above needs just two parameters at user level synch_rep = none | recv | apply synch_rep_timeout = Ns and an additional parameter in recovery.conf to say whether a standby is providing the facility for sync replication (as requested by Yeb etc) (default = yes). So this is the same as having quorum = 0 or 1 (boring but simple) and having sync_rep_timeout_action = commit in all cases (clear behaviour in failure modes, without need for per-standby parameters). The user specifies how long they wish to wait, but that wait never changes the flow of WAL data through the cluster, so we don't need to retune and redesign the existing system for reduced latency. It allows mixed synchronous and asynchronous replication with *ease*. If we design things differently that wouldn't be the case. The design is: * simple - Doesn't require any WAL or libpq changes * modular - almost completely isolated from existing components in 9.0. (e.g. WALSender doesn't know or care about WALSync, WALReceiver never needs to speak to WALAck directly). * performant - async and sync can co-exist; WALReceiver never waits; no need to retune WALSender operation for synchronous mode * low latency - the backchannel from standby to primary uses a separate connection so can operate without slowing down data from primary * user centric - allows user control over this feature, an important tool for real world performance * hot standby - implements xid back channel with ease (later phase) We can hang other options on this later - nothing else is essential. Development time ~ 1 man month because similar code exists for all aspects described above, so no research or internals discussion required. Yes, this is a 3rd design for sync rep, though I think it improves upon the things I've heard so far from other authors and also includes feedback from Dimitri, Heikki, Yeb, Alastair. I'm happy to code this as well, when 9.1 dev starts and a benchmark should be interesting also. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
|
Pages: 1 Prev: pg_trgm Next: [HACKERS] Streaming Replication: Checkpoint_segment and wal_keep_segments on standby |