From: Simon Riggs on 20 Jan 2010 04:40 On Wed, 2010-01-20 at 06:14 +0100, Andres Freund wrote: > > > > Full resolution patch attached for Startup process waits on buffer pins. > > > > Startup process sets SIGALRM when waiting on a buffer pin. If woken by > > alarm we send SIGUSR1 to all backends requesting that they check to see > > if they are blocking Startup process. If so, they throw ERROR/FATAL as > > for other conflict resolutions. Deadlock stop gap removed. > > max_standby_delay = -1 option removed to prevent deadlock. > Wouldnt it be more foolproof to also loop around sending the FATAL? Not that > its likely but... More foolproof and much less accurate. The Startup process doesn't know who is holding the buffer pin that blocks it, so it could not target a FATAL. > From HoldingBufferPinThatDelaysRecovery youre calling > GetStartupBufferPinWaitBufId - that sounds a bit dangerous because that one is > acquiring a spinlock which can also get taken at other places. Its not the > most likely scenario, but it would certainly be annoying to debug. Spinlock. It isn't held for long in any situation. What problem do you foresee? > Is there any supported platform with sizeof(sig_atomic_t) <4 - I would doubt > so? If not the locking in GetStartupBufferPinWaitBufId and > SetStartupBufferPinWaitBufId shouldnt be needed? I prefer spinlocking. > Same issue issue (and more likely to trigger) exists with CheckStandbyTimeout- > >SendRecoveryConflictWithBufferPin->CancelDBBackends I don't see an issue. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Andres Freund on 20 Jan 2010 04:45 On Wednesday 20 January 2010 06:30:28 Tom Lane wrote: > Andres Freund <andres(a)anarazel.de> writes: > > Is there any supported platform with sizeof(sig_atomic_t) <4 - I would > > doubt so? > > Er ... what? I believe there are live platforms with sig_atomic_t = char. > If we're assuming more that's a must-fix. The reason I have asked is that the code is doing things like: /* * Used by backends when they receive a request to check for buffer pin waits. */ int GetStartupBufferPinWaitBufId(void) { int bufid; /* use volatile pointer to prevent code rearrangement */ volatile PROC_HDR *procglobal = ProcGlobal; SpinLockAcquire(ProcStructLock); bufid = procglobal->startupBufferPinWaitBufId; SpinLockRelease(ProcStructLock); return bufid; } or similar things with LWLockAcquire in a signal handler which strikes me as a not that good idea. As at least on x86 reading an integer is atomic the above spinlock is pointless. My cross arch experience is barely existing, so... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Andres Freund on 20 Jan 2010 04:49 On Wednesday 20 January 2010 10:40:10 Simon Riggs wrote: > On Wed, 2010-01-20 at 06:14 +0100, Andres Freund wrote: > > > Full resolution patch attached for Startup process waits on buffer > > > pins. > > > > > > Startup process sets SIGALRM when waiting on a buffer pin. If woken by > > > alarm we send SIGUSR1 to all backends requesting that they check to see > > > if they are blocking Startup process. If so, they throw ERROR/FATAL as > > > for other conflict resolutions. Deadlock stop gap removed. > > > max_standby_delay = -1 option removed to prevent deadlock. > > > > Wouldnt it be more foolproof to also loop around sending the FATAL? Not > > that its likely but... > > More foolproof and much less accurate. The Startup process doesn't know > who is holding the buffer pin that blocks it, so it could not target a > FATAL. > > > From HoldingBufferPinThatDelaysRecovery youre calling > > GetStartupBufferPinWaitBufId - that sounds a bit dangerous because that > > one is acquiring a spinlock which can also get taken at other places. > > Its not the most likely scenario, but it would certainly be annoying to > > debug. > Spinlock. It isn't held for long in any situation. What problem do you > foresee? If any backend is signalled while currently holding the ProcStructLock there is a basically unrecoverable deadlock - its not likely but possible. > > Is there any supported platform with sizeof(sig_atomic_t) <4 - I would > > doubt so? If not the locking in GetStartupBufferPinWaitBufId and > > SetStartupBufferPinWaitBufId shouldnt be needed? > I prefer spinlocking. Well, its deadlock land taking the same lock inside and outside of a signal handler... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 20 Jan 2010 04:52 On Wed, 2010-01-20 at 10:45 +0100, Andres Freund wrote: > LWLockAcquire I'm using spinlocks, not lwlocks. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Andres Freund on 20 Jan 2010 05:04
On Wednesday 20 January 2010 10:52:24 Simon Riggs wrote: > On Wed, 2010-01-20 at 10:45 +0100, Andres Freund wrote: > > LWLockAcquire > > I'm using spinlocks, not lwlocks. CancelDBBackends which is used in SendRecoveryConflictWithBufferPin which in turn used by CheckStandbyTimeout triggered by SIGALRM acquires the lwlock. Now that case is a bit less dangerous because you would have to interrupt yourself to trigger a deadlock there because the code sleeps soon after setting up the handler. If ever two SIGALRM occur consecutive there is a problem. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |