Prev: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers
Next: [HACKERS] pg_upgrade output directory
From: Florian Pflug on 12 Jun 2010 07:50 On Jun 12, 2010, at 3:10 , Josh Berkus wrote: >> Hm, but then Robert's failure case is real, and streaming replication might break due to an OS-level crash of the master. Or am I missing something? > > 1) Master goes out > 2) "floating" transaction applied to standby. > 3) Standby goes out > 4) Power back on > 5) master comes up > 6) standby comes up > > It seems like, in that sequence, the standby would have one transaction > which the master doesn't have, yet the standby thinks it can continue > getting WAL from the master. Or did I miss something which makes this > impossible? I did indeed miss something - with wal_sync_method set to either open_datasync or open_sync, all written WAL is also synced. Since open_datasync is the preferred setting according to http://www.postgresql.org/docs/9.0/static/runtime-config-wal.html#GUC-WAL-SYNC-METHOD, systems supporting open_datasync should be safe. My Ubuntu 10.04 box running postgres 8.4.4 doesn't support open_datasync though, and hence defaults to fdatasync. Probably because of this fragment in xlogdefs.h #if O_DSYNC != BARE_OPEN_SYNC_FLAG #define OPEN_DATASYNC_FLAG (O_DSYNC | PG_O_DIRECT) #endif glibc defines O_DSYNC as an alias for O_SYNC and warrants that with "Most Linux filesystems don't actually implement the POSIX O_SYNC semantics, which require all metadata updates of a write to be on disk on returning to userspace, but only the O_DSYNC semantics, which require only actual file data and metadata necessary to retrieve it to be on disk by the time the system call returns." If that is true, I believe we should default to open_sync, not fdatasync if open_datasync isn't available, at least on linux. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Heikki Linnakangas on 13 Jun 2010 01:24 On 12/06/10 01:16, Josh Berkus wrote: > >> Well, we're already not waiting for fsync, which is the slowest part. >> If there's a performance problem, it may be because FADVISE_DONTNEED >> disables kernel buffering so that we're forced to actually read the data >> back from disk before sending it on down the wire. > > Well, that's fairly direct to solve, no? Just disable FADVISE_DONTNEED > if walsenders> 0. We already do that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Smith on 13 Jun 2010 03:36 Florian Pflug wrote: > glibc defines O_DSYNC as an alias for O_SYNC and warrants that with > "Most Linux filesystems don't actually implement the POSIX O_SYNC semantics, which require all metadata updates of a write to be on disk on returning to userspace, but only the O_DSYNC semantics, which require only actual file data and metadata necessary to retrieve it to be on disk by the time the system call returns." > > If that is true, I believe we should default to open_sync, not fdatasync if open_datasync isn't available, at least on linux. > It's not true, because Linux O_SYNC semantics are basically that it's never worked reliably on ext3. See http://archives.postgresql.org/pgsql-hackers/2007-10/msg01310.php for example of how terrible the situation would be if O_SYNC were the default on Linux. We just got a report that a better O_DSYNC is now properly exposed starting on kernel 2.6.33+glibc 2.12: http://archives.postgresql.org/message-id/201006041539.03868.cousinmarc(a)gmail.com and it's possible they may have finally fixed it so it work like it's supposed to. PostgreSQL versions compiled against the right prerequisites will default to O_DSYNC by themselves. Whether or not this is a good thing has yet to be determined. The last thing we'd want to do at this point is make the old and usually broken O_SYNC behavior suddenly preferred, when the new and possibly fixed O_DSYNC one will be automatically selected when available without any code changes on the database side. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg(a)2ndQuadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Fujii Masao on 14 Jun 2010 04:14 On Fri, Jun 11, 2010 at 11:24 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: > I think the failover case might be OK. �But if the master crashes and > restarts, the slave might be left thinking its xlog position is ahead > of the xlog position on the master. Right. Unless we perform a failover in this case, the standby might go down because of inconsistency of WAL after restarting the master. To avoid this problem, walsender must wait for WAL to be not only written but also *fsynced* on the master before sending it as 9.0 does. Though this would degrade the performance, this might be useful for some cases. We should provide the knob to specify whether to allow the standby to go ahead of the master or not? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Fujii Masao on 14 Jun 2010 04:39
On Fri, Jun 11, 2010 at 11:47 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > Stefan Kaltenbrunner <stefan(a)kaltenbrunner.cc> writes: >> hmm not sure that is what fujii tried to say - I think his point was >> that in the original case we would have serialized all the operations >> (first write+sync on the master, network afterwards and write+sync on >> the slave) and now we could try parallelizing by sending the wal before >> we have synced locally. > > Well, we're already not waiting for fsync, which is the slowest part. No, currently walsender waits for fsync. Walsender tries to send WAL up to xlogctl->LogwrtResult.Write. OTOH, xlogctl->LogwrtResult.Write is updated after XLogWrite() performs fsync. As the result, walsender cannot send WAL not fsynced yet. We should update xlogctl->LogwrtResult.Write before XLogWrite() performs fsync for 9.0? But that change would cause the problem that Robert pointed out. http://archives.postgresql.org/pgsql-hackers/2010-06/msg00670.php > If there's a performance problem, it may be because FADVISE_DONTNEED > disables kernel buffering so that we're forced to actually read the data > back from disk before sending it on down the wire. Currently, if max_wal_senders > 0, POSIX_FADV_DONTNEED is not used for WAL files at all. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |