Exposing the Xact commit order to the user [PgSql]

Prev: [HACKERS] Exposing the Xact commit order to the user
Next: [HACKERS] Synchronization levels in SR

From: Jan Wieck on 25 May 2010 16:41

On 5/25/2010 12:03 PM, Simon Riggs wrote:
> On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote:
>
>> In some systems (data warehousing, replication), the order of commits is
>> important, since that is the order in which changes have become visible.
>> This information could theoretically be extracted from the WAL, but
>> scanning the entire WAL just to extract this tidbit of information would
>> be excruciatingly painful.
>
> I think it would be quite simple to read WAL. WALSender reads the WAL
> file after its been flushed, so it would be simple for it to read a blob
> of WAL and then extract the commit order from it.
>
> Overall though, it would be easier and more efficient to *add* info to
> WAL and then do all this processing *after* WAL has been transported
> elsewhere. Extracting info with DDL triggers, normal triggers, commit
> order and everything else seems like too much work to me. Every other
> RDBMS has moved away from trigger-based replication and we should give
> that serious consideration also.

Reading the entire WAL just to find all COMMIT records, then go back to
the origin database to get the actual replication log you're looking for
is simpler and more efficient? I don't think so.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on 25 May 2010 16:50

On Tue, 2010-05-25 at 16:41 -0400, Jan Wieck wrote:
> On 5/25/2010 12:03 PM, Simon Riggs wrote:
> > On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote:
> >
> >> In some systems (data warehousing, replication), the order of commits is
> >> important, since that is the order in which changes have become visible.
> >> This information could theoretically be extracted from the WAL, but
> >> scanning the entire WAL just to extract this tidbit of information would
> >> be excruciatingly painful.
> >
> > I think it would be quite simple to read WAL. WALSender reads the WAL
> > file after its been flushed, so it would be simple for it to read a blob
> > of WAL and then extract the commit order from it.
> >
> > Overall though, it would be easier and more efficient to *add* info to
> > WAL and then do all this processing *after* WAL has been transported
> > elsewhere. Extracting info with DDL triggers, normal triggers, commit
> > order and everything else seems like too much work to me. Every other
> > RDBMS has moved away from trigger-based replication and we should give
> > that serious consideration also.
>
> Reading the entire WAL just to find all COMMIT records, then go back to
> the origin database to get the actual replication log you're looking for
> is simpler and more efficient? I don't think so.

Agreed, but I think I've not explained myself well enough.

I proposed two completely separate ideas; the first one was this:

If you must get commit order, get it from WAL on *origin*, using exact
same code that current WALSender provides, plus some logic to read
through the WAL records and extract commit/aborts. That seems much
simpler than the proposal you outlined and as SR shows, its low latency
as well since commits write to WAL. No need to generate event ticks
either, just use XLogRecPtrs as WALSender already does.

I see no problem with integrating that into core, technically or
philosophically.

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jan Wieck on 25 May 2010 17:10

On 5/25/2010 4:50 PM, Simon Riggs wrote:
> On Tue, 2010-05-25 at 16:41 -0400, Jan Wieck wrote:
>> On 5/25/2010 12:03 PM, Simon Riggs wrote:
>> > On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote:
>> >
>> >> In some systems (data warehousing, replication), the order of commits is
>> >> important, since that is the order in which changes have become visible.
>> >> This information could theoretically be extracted from the WAL, but
>> >> scanning the entire WAL just to extract this tidbit of information would
>> >> be excruciatingly painful.
>> >
>> > I think it would be quite simple to read WAL. WALSender reads the WAL
>> > file after its been flushed, so it would be simple for it to read a blob
>> > of WAL and then extract the commit order from it.
>> >
>> > Overall though, it would be easier and more efficient to *add* info to
>> > WAL and then do all this processing *after* WAL has been transported
>> > elsewhere. Extracting info with DDL triggers, normal triggers, commit
>> > order and everything else seems like too much work to me. Every other
>> > RDBMS has moved away from trigger-based replication and we should give
>> > that serious consideration also.
>>
>> Reading the entire WAL just to find all COMMIT records, then go back to
>> the origin database to get the actual replication log you're looking for
>> is simpler and more efficient? I don't think so.
>
> Agreed, but I think I've not explained myself well enough.
>
> I proposed two completely separate ideas; the first one was this:
>
> If you must get commit order, get it from WAL on *origin*, using exact
> same code that current WALSender provides, plus some logic to read
> through the WAL records and extract commit/aborts. That seems much
> simpler than the proposal you outlined and as SR shows, its low latency
> as well since commits write to WAL. No need to generate event ticks
> either, just use XLogRecPtrs as WALSender already does.
>
> I see no problem with integrating that into core, technically or
> philosophically.
>

Which means that if I want to allow a consumer of that commit order data
to go offline for three days or so to replicate the 5 requested, low
volume tables, the origin needs to hang on to the entire WAL log from
all 100 other high volume tables?

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jan Wieck on 25 May 2010 22:44

On 5/25/2010 4:16 PM, Tom Lane wrote:
> Jan Wieck <JanWieck(a)Yahoo.com> writes:
>>> No, I meant how will the *function* know, if a superuser and/or some
>>> background process can purge records at any time?
>
>> The data contains timestamps which are supposedly taken in commit order.
>
> You can *not* rely on the commit timestamps to be in exact order.
> (Perhaps approximate ordering is good enough for what you want here,
> but just be careful to not fall into the trap of assuming that they're
> exactly ordered.)

I am well aware of the fact that commit timestamps within the WAL can go
backwards and that the serial numbers of this proposed implementation of
commit order can even be different from what the timestamps AND the WAL
are saying.

As long as the serial number (record position inside of segment) is
determined while the transaction still holds all its locks, this is
going to be good enough for what async replication users today are used
to. Again, it will not magically make it possible to determine a
serializable order of actions, that happened from transactions running
in read committed isolation level, post mortem. I don't even even think
that is possible at all.

And I don't think anyone proposed a solution for that problem anyways.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jan Wieck on 26 May 2010 07:08

On 5/26/2010 7:03 AM, Jan Wieck wrote:
> To replicate from one consistent state to the next, the replication
> system now selects all log rows between two snapshots. Between here
> means it simulates MVCC visibility in the sense of that the writing
> transaction was in progress when the first snapshot was taken and had
> committed at the second. The resulting WHERE clause looks something like

Or it entirely happened between the snapshots, obviously.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Prev: [HACKERS] Exposing the Xact commit order to the user
Next: [HACKERS] Synchronization levels in SR