From: Fujii Masao on 11 Jun 2010 09:14 Hi, In 9.0, walsender reads WAL always from the disk and sends it to the standby. That is, we cannot send WAL until it has been written (and flushed) to the disk. This degrades the performance of synchronous replication very much since a transaction commit must wait for the WAL write time *plus* the replication time. The attached patch enables walsender to read data from WAL buffers in addition to the disk. Since we can write and send WAL simultaneously, in synchronous replication, a transaction commit has only to wait for either of them. So the performance would significantly increase. Now three hackers (Zoltan, Simon and me) are planning to develop synchronous replication feature. I'm not sure whose patch will be committed at last. But since the attached patch provides just a infrastructure to optimize SR, it would work fine with any of them together and have a good effect. I'll add the patch into the next CF. AFAIK the ReviewFest will start Jun 15. During that, if you are interested in the patch, please feel free to review it. Also you can get the code change from my git repository: git://git.postgresql.org/git/users/fujii/postgres.git branch: read-wal-buffers From here I talk about the detail of the change. At first, walsender reads WAL from the disk. If it has reached the current write location (i.e., there is no unsent WAL in the disk), then it attempts to read from WAL buffers. This buffer reading continues until the WAL to send has been purged from WAL buffers. IOW, If WAL buffers is large enough and walsender has been catching up with insertion of WAL, it can read WAL from the buffers forever. Then if WAL to send has purged from the buffers, walsender backs off and tries to read it from the disk. If we can find no WAL to send in the disk, walsender attempts to read WAL from the buffers again. Walsender repeats these operations. The location of the oldest record in the buffers is saved in the shared memory. This location is used to calculate whether the particular WAL is in the buffers or not. To avoid lock contention, walsender reads WAL buffers and XLogCtl->xlblocks without holding neither WALInsertLock nor WALWriteLock. Of course, they might be changed because of buffer replacement while being read. So after reading them, we check that what we read was valid by comparing the location of the read WAL with the location of the oldest record in the buffers. This logic is similar to what XLogRead() does at the end. This feature is required for preventing the performance of synchronous replication from dropping significantly. It can cut the time that a transaction committed on the master takes to become visible on the standby. So, it's also useful for asynchronous replication. Thought? Comment? Objection? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
|
Pages: 1 Prev: PG 9.1 tentative timeline Next: Proposal for 9.1: WAL streaming from WAL buffers |