Prev: [HACKERS] Exposing the Xact commit order to the user
Next: [HACKERS] Synchronization levels in SR
From: Jan Wieck on 25 May 2010 16:41 On 5/25/2010 12:03 PM, Simon Riggs wrote: > On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote: > >> In some systems (data warehousing, replication), the order of commits is >> important, since that is the order in which changes have become visible. >> This information could theoretically be extracted from the WAL, but >> scanning the entire WAL just to extract this tidbit of information would >> be excruciatingly painful. > > I think it would be quite simple to read WAL. WALSender reads the WAL > file after its been flushed, so it would be simple for it to read a blob > of WAL and then extract the commit order from it. > > Overall though, it would be easier and more efficient to *add* info to > WAL and then do all this processing *after* WAL has been transported > elsewhere. Extracting info with DDL triggers, normal triggers, commit > order and everything else seems like too much work to me. Every other > RDBMS has moved away from trigger-based replication and we should give > that serious consideration also. Reading the entire WAL just to find all COMMIT records, then go back to the origin database to get the actual replication log you're looking for is simpler and more efficient? I don't think so. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 25 May 2010 16:50 On Tue, 2010-05-25 at 16:41 -0400, Jan Wieck wrote: > On 5/25/2010 12:03 PM, Simon Riggs wrote: > > On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote: > > > >> In some systems (data warehousing, replication), the order of commits is > >> important, since that is the order in which changes have become visible. > >> This information could theoretically be extracted from the WAL, but > >> scanning the entire WAL just to extract this tidbit of information would > >> be excruciatingly painful. > > > > I think it would be quite simple to read WAL. WALSender reads the WAL > > file after its been flushed, so it would be simple for it to read a blob > > of WAL and then extract the commit order from it. > > > > Overall though, it would be easier and more efficient to *add* info to > > WAL and then do all this processing *after* WAL has been transported > > elsewhere. Extracting info with DDL triggers, normal triggers, commit > > order and everything else seems like too much work to me. Every other > > RDBMS has moved away from trigger-based replication and we should give > > that serious consideration also. > > Reading the entire WAL just to find all COMMIT records, then go back to > the origin database to get the actual replication log you're looking for > is simpler and more efficient? I don't think so. Agreed, but I think I've not explained myself well enough. I proposed two completely separate ideas; the first one was this: If you must get commit order, get it from WAL on *origin*, using exact same code that current WALSender provides, plus some logic to read through the WAL records and extract commit/aborts. That seems much simpler than the proposal you outlined and as SR shows, its low latency as well since commits write to WAL. No need to generate event ticks either, just use XLogRecPtrs as WALSender already does. I see no problem with integrating that into core, technically or philosophically. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Jan Wieck on 25 May 2010 17:10 On 5/25/2010 4:50 PM, Simon Riggs wrote: > On Tue, 2010-05-25 at 16:41 -0400, Jan Wieck wrote: >> On 5/25/2010 12:03 PM, Simon Riggs wrote: >> > On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote: >> > >> >> In some systems (data warehousing, replication), the order of commits is >> >> important, since that is the order in which changes have become visible. >> >> This information could theoretically be extracted from the WAL, but >> >> scanning the entire WAL just to extract this tidbit of information would >> >> be excruciatingly painful. >> > >> > I think it would be quite simple to read WAL. WALSender reads the WAL >> > file after its been flushed, so it would be simple for it to read a blob >> > of WAL and then extract the commit order from it. >> > >> > Overall though, it would be easier and more efficient to *add* info to >> > WAL and then do all this processing *after* WAL has been transported >> > elsewhere. Extracting info with DDL triggers, normal triggers, commit >> > order and everything else seems like too much work to me. Every other >> > RDBMS has moved away from trigger-based replication and we should give >> > that serious consideration also. >> >> Reading the entire WAL just to find all COMMIT records, then go back to >> the origin database to get the actual replication log you're looking for >> is simpler and more efficient? I don't think so. > > Agreed, but I think I've not explained myself well enough. > > I proposed two completely separate ideas; the first one was this: > > If you must get commit order, get it from WAL on *origin*, using exact > same code that current WALSender provides, plus some logic to read > through the WAL records and extract commit/aborts. That seems much > simpler than the proposal you outlined and as SR shows, its low latency > as well since commits write to WAL. No need to generate event ticks > either, just use XLogRecPtrs as WALSender already does. > > I see no problem with integrating that into core, technically or > philosophically. > Which means that if I want to allow a consumer of that commit order data to go offline for three days or so to replicate the 5 requested, low volume tables, the origin needs to hang on to the entire WAL log from all 100 other high volume tables? Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Jan Wieck on 25 May 2010 22:44 On 5/25/2010 4:16 PM, Tom Lane wrote: > Jan Wieck <JanWieck(a)Yahoo.com> writes: >>> No, I meant how will the *function* know, if a superuser and/or some >>> background process can purge records at any time? > >> The data contains timestamps which are supposedly taken in commit order. > > You can *not* rely on the commit timestamps to be in exact order. > (Perhaps approximate ordering is good enough for what you want here, > but just be careful to not fall into the trap of assuming that they're > exactly ordered.) I am well aware of the fact that commit timestamps within the WAL can go backwards and that the serial numbers of this proposed implementation of commit order can even be different from what the timestamps AND the WAL are saying. As long as the serial number (record position inside of segment) is determined while the transaction still holds all its locks, this is going to be good enough for what async replication users today are used to. Again, it will not magically make it possible to determine a serializable order of actions, that happened from transactions running in read committed isolation level, post mortem. I don't even even think that is possible at all. And I don't think anyone proposed a solution for that problem anyways. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Jan Wieck on 26 May 2010 07:08
On 5/26/2010 7:03 AM, Jan Wieck wrote: > To replicate from one consistent state to the next, the replication > system now selects all log rows between two snapshots. Between here > means it simulates MVCC visibility in the sense of that the writing > transaction was in progress when the first snapshot was taken and had > committed at the second. The resulting WHERE clause looks something like Or it entirely happened between the snapshots, obviously. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |