Proposal for 9.1: WAL streaming from WAL buffers [PgSql]

Prev: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers
Next: [HACKERS] pg_upgrade output directory

From: Robert Haas on 14 Jun 2010 09:13

On Mon, Jun 14, 2010 at 8:41 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote:
> On Mon, Jun 14, 2010 at 8:10 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
>> Maybe. �That sounds like a pretty enormous foot-gun to me, considering
>> that we have no way of recovering from the situation where the standby
>> gets ahead of the master.
>
> No, we can do that by reconstructing the standby from the backup.
>
> And, that situation is not a problem for users including me who prefer to
> perform a failover when the master goes down.

You don't get to pick - if a backend crashes on the master, it will
restart right away and come up, but the slave will now be hosed...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 14 Jun 2010 11:02

Fujii Masao <masao.fujii(a)gmail.com> writes:
> On Fri, Jun 11, 2010 at 11:47 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> Well, we're already not waiting for fsync, which is the slowest part.

> No, currently walsender waits for fsync.

No, you're mistaken.

> Walsender tries to send WAL up to xlogctl->LogwrtResult.Write. OTOH,
> xlogctl->LogwrtResult.Write is updated after XLogWrite() performs fsync.

Wrong. LogwrtResult.Write tracks how far we've written out data,
but it is only (known to be) fsync'd as far as LogwrtResult.Flush.

> But that change would cause the problem that Robert pointed out.
> http://archives.postgresql.org/pgsql-hackers/2010-06/msg00670.php

Yes. Possibly walsender should only be allowed to send as far as
LogwrtResult.Flush.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on 15 Jun 2010 00:46

On Mon, Jun 14, 2010 at 10:13 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
> On Mon, Jun 14, 2010 at 8:41 AM, Fujii Masao <masao.fujii(a)gmail.com> wrote:
>> On Mon, Jun 14, 2010 at 8:10 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
>>> Maybe. �That sounds like a pretty enormous foot-gun to me, considering
>>> that we have no way of recovering from the situation where the standby
>>> gets ahead of the master.
>>
>> No, we can do that by reconstructing the standby from the backup.
>>
>> And, that situation is not a problem for users including me who prefer to
>> perform a failover when the master goes down.
>
> You don't get to pick - if a backend crashes on the master, it will
> restart right away and come up, but the slave will now be hosed...

You are concerned about the case where postmaster automatically restarts
the crash recovery, in particular? Yes, this case is more problematic.
If the standby is ahead of the master, the standby might find an invalid
record and run into the infinite retry loop, or keep working without
noticing the inconsistency between the database and the WAL.

I'm thinking that walreceiver should throw a PANIC when it receives the
record which is in the LSN older than the last WAL receive location,
except the beginning of streaming (because the standby always requests
for streaming from the starting of WAL file at first even if some records
have already been received in previous time). Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Fujii Masao on 15 Jun 2010 00:47

On Tue, Jun 15, 2010 at 12:02 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Fujii Masao <masao.fujii(a)gmail.com> writes:
>> On Fri, Jun 11, 2010 at 11:47 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>>> Well, we're already not waiting for fsync, which is the slowest part.
>
>> No, currently walsender waits for fsync.
>
> No, you're mistaken.
>
>> Walsender tries to send WAL up to xlogctl->LogwrtResult.Write. OTOH,
>> xlogctl->LogwrtResult.Write is updated after XLogWrite() performs fsync.
>
> Wrong. �LogwrtResult.Write tracks how far we've written out data,
> but it is only (known to be) fsync'd as far as LogwrtResult.Flush.

Hmm.. I agree that xlogctl->LogwrtResult.Write indicates the byte position
we've written. But in the current XLogWrite() code, it's updated after
XLogWrite() calls issue_xlog_fsync(). No?

Of course, the backend-local LogwrtResult.Write is updated before
issue_xlog_fsync(), but it's not available by walsender.

Am I missing something?

>> But that change would cause the problem that Robert pointed out.
>> http://archives.postgresql.org/pgsql-hackers/2010-06/msg00670.php
>
> Yes. �Possibly walsender should only be allowed to send as far as
> LogwrtResult.Flush.

Yes, in order to avoid that problem, walsender should wait for WAL
to be fsync'd before sending it.

But I'm worried that this would slow down the performance on the master
significantly because WAL flush and WAL streaming are not performed
concurrently and the backend must wait for both in a serial manner.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on 15 Jun 2010 01:16

On 15/06/10 07:47, Fujii Masao wrote:
> On Tue, Jun 15, 2010 at 12:02 AM, Tom Lane<tgl(a)sss.pgh.pa.us> wrote:
>> Fujii Masao<masao.fujii(a)gmail.com> writes:
>>> Walsender tries to send WAL up to xlogctl->LogwrtResult.Write. OTOH,
>>> xlogctl->LogwrtResult.Write is updated after XLogWrite() performs fsync.
>>
>> Wrong. LogwrtResult.Write tracks how far we've written out data,
>> but it is only (known to be) fsync'd as far as LogwrtResult.Flush.
>
> Hmm.. I agree that xlogctl->LogwrtResult.Write indicates the byte position
> we've written. But in the current XLogWrite() code, it's updated after
> XLogWrite() calls issue_xlog_fsync(). No?

issue_xlog_fsync() is only called if the caller requested a flush by
advancing WriteRqst.Flush.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers
Next: [HACKERS] pg_upgrade output directory