From: Tom Lane on 19 Apr 2010 14:34 I wrote: > The point is that a standalone backend will fail to execute recovery > correctly: > http://archives.postgresql.org/pgsql-hackers/2009-09/msg01297.php After digging around a bit, it seems like the cleanest solution would be to move the responsibility for calling StartupXLOG in a standalone backend into InitPostgres. At the point where the latter currently has /* * Initialize local process's access to XLOG, if appropriate. In * bootstrap case we skip this since StartupXLOG() was run instead. */ if (!bootstrap) (void) RecoveryInProgress(); we'd add a couple of lines to call StartupXLOG if !IsUnderPostmaster, and then remove the call from postgres.c. I haven't tested this yet but it looks like the correct state has been set up at that point. Anyone see any obvious holes in the idea? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 19 Apr 2010 16:18 BTW, just for the archives' sake: it took a good long time to develop a reproducible test case for this. It seems that 99% of the WAL replay code does *not* depend on the missing state. I was eventually able to reproduce the case originally reported, namely a crash during btree_xlog_cleanup; but to get that you need (a) WAL to end between a btree page split and insertion of the new parent record, and (b) have the resulting insertion need to obtain a new btree page, ie there's another split or newroot forced then, and (c) not have any available pages in the index's FSM, so that we have to LockRelationForExtension, which is what crashes for lack of a PGPROC. So that probably explains why we went so long without recognizing the bug. This again points up the fact that WAL recovery isn't as well tested as one could wish. Koichi-san's efforts to create a test with 100% coverage of all types of WAL records are good, but that'd not have helped us to find this. We should think about ways to provide better test coverage of end-of-WAL cleanup. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
First
|
Prev
|
Pages: 1 2 Prev: Standalone backends run StartupXLOG in anincorrect environment Next: Windowing Qual Pushdown |