From: "Kevin Grittner" on 21 Dec 2009 10:21 Hiroyuki Yamada <yamada(a)kokolink.net> wrote: > 4. Startup process tries to redo XLOG_HEAP2_CLEAN record, calls > LockBufferForCleanup() and freezes until the Xact 1 ends. I think they word you're searching for is "blocks". Blocking to protect integrity doesn't sound like a bug to me; perhaps an opportunity for enhancement. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 19 Jan 2010 05:46 On Tue, 2009-12-15 at 20:11 +0900, Hiroyuki Yamada wrote: > Hot Standby node can freeze when startup process calls LockBufferForCleanup(). > This bug can be reproduced by the following procedure. > > 0. start Hot Standby, with one active node(node A) and one standby node(node B) > 1. create table X and table Y in node A > 2. insert several rows in table X in node A > 3. delete one row from table X in node A > 4. begin xact 1 in node A, execute following commands, and leave xact 1 open > 4.1 LOCK table Y IN ACCESS EXCLUSIVE MODE > 5. wait until WAL's for above actions are applied in node B > 6. begin xact 2 in node B, and execute following commands > 6.1 DECLARE CURSOR test_cursor FOR SELECT * FROM table X; > 6.2 FETCH test_cursor; > 6.3 SELECT * FROM table Y; > 7. execute VACUUM FREEZE table A in node A > 8. commit xact 1 in node A > > ...then in node B occurs following "deadlock" situation, which is not detected by deadlock check. > * startup process waits for xact 2 to release buffers in table X (in LockBufferForCleanup()) > * xact 2 waits for startup process to release ACCESS EXCLUSIVE lock in table Y Deadlock bug was prevented by stop-gap measure in December commit. Full resolution patch attached for Startup process waits on buffer pins. Startup process sets SIGALRM when waiting on a buffer pin. If woken by alarm we send SIGUSR1 to all backends requesting that they check to see if they are blocking Startup process. If so, they throw ERROR/FATAL as for other conflict resolutions. Deadlock stop gap removed. max_standby_delay = -1 option removed to prevent deadlock. Reviews welcome, otherwise commit at end of week. -- Simon Riggs www.2ndQuadrant.com
From: Andres Freund on 20 Jan 2010 00:14 On Tuesday 19 January 2010 11:46:24 Simon Riggs wrote: > On Tue, 2009-12-15 at 20:11 +0900, Hiroyuki Yamada wrote: > > Hot Standby node can freeze when startup process calls > > LockBufferForCleanup(). This bug can be reproduced by the following > > procedure. > > > > 0. start Hot Standby, with one active node(node A) and one standby > > node(node B) 1. create table X and table Y in node A > > 2. insert several rows in table X in node A > > 3. delete one row from table X in node A > > 4. begin xact 1 in node A, execute following commands, and leave xact 1 > > open 4.1 LOCK table Y IN ACCESS EXCLUSIVE MODE > > 5. wait until WAL's for above actions are applied in node B > > 6. begin xact 2 in node B, and execute following commands > > 6.1 DECLARE CURSOR test_cursor FOR SELECT * FROM table X; > > 6.2 FETCH test_cursor; > > 6.3 SELECT * FROM table Y; > > 7. execute VACUUM FREEZE table A in node A > > 8. commit xact 1 in node A > > > > ...then in node B occurs following "deadlock" situation, which is not > > detected by deadlock check. > > > > * startup process waits for xact 2 to release buffers in table X (in > > LockBufferForCleanup()) * xact 2 waits for startup process to release > > ACCESS EXCLUSIVE lock in table Y > > Deadlock bug was prevented by stop-gap measure in December commit. > > Full resolution patch attached for Startup process waits on buffer pins. > > Startup process sets SIGALRM when waiting on a buffer pin. If woken by > alarm we send SIGUSR1 to all backends requesting that they check to see > if they are blocking Startup process. If so, they throw ERROR/FATAL as > for other conflict resolutions. Deadlock stop gap removed. > max_standby_delay = -1 option removed to prevent deadlock. Wouldnt it be more foolproof to also loop around sending the FATAL? Not that its likely but... From HoldingBufferPinThatDelaysRecovery youre calling GetStartupBufferPinWaitBufId - that sounds a bit dangerous because that one is acquiring a spinlock which can also get taken at other places. Its not the most likely scenario, but it would certainly be annoying to debug. Is there any supported platform with sizeof(sig_atomic_t) <4 - I would doubt so? If not the locking in GetStartupBufferPinWaitBufId and SetStartupBufferPinWaitBufId shouldnt be needed? Same issue issue (and more likely to trigger) exists with CheckStandbyTimeout- >SendRecoveryConflictWithBufferPin->CancelDBBackends Too tired to look further. Greetings, Andres PS: Sorry for not doing anything on the idle front - I have been burried alive the last days. -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 20 Jan 2010 00:30 Andres Freund <andres(a)anarazel.de> writes: > Is there any supported platform with sizeof(sig_atomic_t) <4 - I would doubt > so? Er ... what? I believe there are live platforms with sig_atomic_t = char. If we're assuming more that's a must-fix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Andres Freund on 20 Jan 2010 04:35
On Wednesday 20 January 2010 06:30:28 Tom Lane wrote: > Andres Freund <andres(a)anarazel.de> writes: > > Is there any supported platform with sizeof(sig_atomic_t) <4 - I would > > doubt so? > > Er ... what? I believe there are live platforms with sig_atomic_t = char. > If we're assuming more that's a must-fix. So were assuming genereally that a integer cannot be read/written atomatically? Or was that only relating to the actualy signal.h typedef? Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |