Proposal for 9.1: WAL streaming from WAL buffers [PgSql]

Prev: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers
Next: [HACKERS] pg_upgrade output directory

From: Tom Lane on 11 Jun 2010 10:47

Stefan Kaltenbrunner <stefan(a)kaltenbrunner.cc> writes:
> hmm not sure that is what fujii tried to say - I think his point was
> that in the original case we would have serialized all the operations
> (first write+sync on the master, network afterwards and write+sync on
> the slave) and now we could try parallelizing by sending the wal before
> we have synced locally.

Well, we're already not waiting for fsync, which is the slowest part.
If there's a performance problem, it may be because FADVISE_DONTNEED
disables kernel buffering so that we're forced to actually read the data
back from disk before sending it on down the wire.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Stefan Kaltenbrunner on 11 Jun 2010 11:15

On 06/11/2010 04:47 PM, Tom Lane wrote:
> Stefan Kaltenbrunner<stefan(a)kaltenbrunner.cc> writes:
>> hmm not sure that is what fujii tried to say - I think his point was
>> that in the original case we would have serialized all the operations
>> (first write+sync on the master, network afterwards and write+sync on
>> the slave) and now we could try parallelizing by sending the wal before
>> we have synced locally.
>
> Well, we're already not waiting for fsync, which is the slowest part.
> If there's a performance problem, it may be because FADVISE_DONTNEED
> disables kernel buffering so that we're forced to actually read the data
> back from disk before sending it on down the wire.

hmm ok - but assuming sync rep we would end up with something like the
following(hypotetically assuming each operation takes 1 time unit):

originally:

write 1
sync 1
network 1
write 1
sync 1

total: 5

whereas in the new case we would basically have the write+sync compete
with network+write+sync in parallel(total 3 units) and we would only
have to wait for the slower of those two sets of operations instead of
the total time of both or am I missing something.

Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on 11 Jun 2010 18:16

> Well, we're already not waiting for fsync, which is the slowest part.
> If there's a performance problem, it may be because FADVISE_DONTNEED
> disables kernel buffering so that we're forced to actually read the data
> back from disk before sending it on down the wire.

Well, that's fairly direct to solve, no? Just disable FADVISE_DONTNEED
if walsenders > 0.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on 11 Jun 2010 21:10

> Hm, but then Robert's failure case is real, and streaming replication might break due to an OS-level crash of the master. Or am I missing something?

Well, in the failover case this isn't a problem, it's a benefit: the
standby gets a transaction which you would have lost off the master.
However, I can see this as a problem in the event of a server-room
powerout with very bad timing where there isn't a failover to the standby:

1) Master goes out
2) "floating" transaction applied to standby.
3) Standby goes out
4) Power back on
5) master comes up
6) standby comes up

It seems like, in that sequence, the standby would have one transaction
which the master doesn't have, yet the standby thinks it can continue
getting WAL from the master. Or did I miss something which makes this
impossible?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Florian Pflug on 11 Jun 2010 20:34

On Jun 11, 2010, at 16:31 , Tom Lane wrote:
> Fujii Masao <masao.fujii(a)gmail.com> writes:
>> In 9.0, walsender reads WAL always from the disk and sends it to the standby.
>> That is, we cannot send WAL until it has been written (and flushed) to the disk.
>
> I believe the above statement to be incorrect: walsender does *not* wait
> for an fsync to occur.

Hm, but then Robert's failure case is real, and streaming replication might break due to an OS-level crash of the master. Or am I missing something?

best regards,
Florian Pflug

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers
Next: [HACKERS] pg_upgrade output directory