Prev: pg_ctl stop -m immediate on the primary server inflatessequences
Next: [HACKERS] non-reproducible failure of random test on HEAD
From: Marko Kreen on 23 Apr 2010 14:22 On 4/23/10, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > Marko Kreen <markokr(a)gmail.com> writes: > > Um, you have been burned by exactly this on x86 also: > > http://archives.postgresql.org/pgsql-hackers/2009-03/msg01265.php > > > Yeah, we never did figure out exactly how come you were observing that > failure on Intel-ish hardware. I was under the impression that Intel > machines didn't have weak-memory-ordering behavior. > > I wonder whether your compiler had rearranged the code in ProcArrayAdd > so that the increment happened before the array element store at the > machine-code level. I think it would be entitled to do that under > standard C semantics, since that ProcArrayStruct pointer isn't marked > volatile. Sounds likely. Which seems to hint its better to handle all processors as weak ordered and then work with explicit locks/memory barriers, than to sprinkle code with 'volatile' to supress optimizations on intel and then still fail on non-intel. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 25 Apr 2010 06:53 On Sat, Apr 24, 2010 at 5:17 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote: >> Both Heikki and I objected to that patch. > > Please explain your objection, based upon the patch and my explanations. Well, we objected to the locking. Having reread the patch a few times though, I think I'm starting to wrap my head around it so, I don't know, maybe it's OK. Have you tested grabbing the ProcArrayLock in exclusive mode instead of having a separate spinlock, to see how that performs? >> And apparently it doesn't >> fix the problem, either. So, -1 from me. > > There is an issue observed in Erik's later tests, but my interpretation > of the results so far is that the sorted array patch successfully > removes the initially reported loss of performance. Is it possible the remaining spikes are due to fights over the spinlock? ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 25 Apr 2010 08:50 On Sun, 2010-04-25 at 06:53 -0400, Robert Haas wrote: > On Sat, Apr 24, 2010 at 5:17 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote: > >> Both Heikki and I objected to that patch. > > > > Please explain your objection, based upon the patch and my explanations. > > Well, we objected to the locking. Having reread the patch a few times > though, I think I'm starting to wrap my head around it so, I don't > know, maybe it's OK. Have you tested grabbing the ProcArrayLock in > exclusive mode instead of having a separate spinlock, to see how that > performs? They both perform well. Heikki and I agree that spinlocks perform well. The point of taking the ProcArrayLock in Shared rather than Exclusive mode is the reduction in contention that gives. Adding a KnownAssignedXid is exactly analogous to starting a transaction. We go to some trouble to avoid taking a lock when we start a transaction and I want to do a similar thing on the standby. I think it's even more important we reduce contention on the standby than it is on the primary because if the Startup process is delayed then it slows down recovery. > >> And apparently it doesn't > >> fix the problem, either. So, -1 from me. > > > > There is an issue observed in Erik's later tests, but my interpretation > > of the results so far is that the sorted array patch successfully > > removes the initially reported loss of performance. > > Is it possible the remaining spikes are due to fights over the spinlock? That one specific spinlock? I considered it. I think its doubtful. The spikes all involved a jump from <5ms to much more than that, often as long as a second. That looks like a pure I/O problem or an LWlock held across a slow I/O or platform specific issues of some kind. Erik's tests were with 1 backend, so that would mean that at most 2 processes might want the spinlock you mention. The spinlock is only taken at snapshot time, so once per transaction in Erik's tests. So the if the spinlock were the problem then one or other of the processes would need to grab and hold the spinlock for a very long time and then release it sometime later. Since the spinlock code has been with us for some time, I doubt there is anything wrong there. I don't have any evidence as to whether the problem is solely on Erik's system, nor whether it is an issue already in the codebase or introduced by this patch. It seems unlikely to me that this patch introduces the problem, and even more unlikely that the issue is caused by two processes fighting over that particular spinlock. The patch is very effective at reducing overall cost of taking snapshots and so I think it needs to be applied to allow more performance data to be collected. I don't suppose this is the last performance tuning we will need to do. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 25 Apr 2010 11:37 On Sun, Apr 25, 2010 at 8:50 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote: > On Sun, 2010-04-25 at 06:53 -0400, Robert Haas wrote: >> On Sat, Apr 24, 2010 at 5:17 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote: >> >> Both Heikki and I objected to that patch. >> > >> > Please explain your objection, based upon the patch and my explanations. >> >> Well, we objected to the locking. Having reread the patch a few times >> though, I think I'm starting to wrap my head around it so, I don't >> know, maybe it's OK. Have you tested grabbing the ProcArrayLock in >> exclusive mode instead of having a separate spinlock, to see how that >> performs? > > They both perform well. Heikki and I agree that spinlocks perform well. > > The point of taking the ProcArrayLock in Shared rather than Exclusive > mode is the reduction in contention that gives. Adding a > KnownAssignedXid is exactly analogous to starting a transaction. We go > to some trouble to avoid taking a lock when we start a transaction and I > want to do a similar thing on the standby. I think it's even more > important we reduce contention on the standby than it is on the primary > because if the Startup process is delayed then it slows down recovery. > >> >> And apparently it doesn't >> >> fix the problem, either. So, -1 from me. >> > >> > There is an issue observed in Erik's later tests, but my interpretation >> > of the results so far is that the sorted array patch successfully >> > removes the initially reported loss of performance. >> >> Is it possible the remaining spikes are due to fights over the spinlock? > > That one specific spinlock? I considered it. I think its doubtful. > > The spikes all involved a jump from <5ms to much more than that, often > as long as a second. That looks like a pure I/O problem or an LWlock > held across a slow I/O or platform specific issues of some kind. > > Erik's tests were with 1 backend, so that would mean that at most 2 > processes might want the spinlock you mention. The spinlock is only > taken at snapshot time, so once per transaction in Erik's tests. So the > if the spinlock were the problem then one or other of the processes > would need to grab and hold the spinlock for a very long time and then > release it sometime later. Since the spinlock code has been with us for > some time, I doubt there is anything wrong there. > > I don't have any evidence as to whether the problem is solely on Erik's > system, nor whether it is an issue already in the codebase or introduced > by this patch. It seems unlikely to me that this patch introduces the > problem, and even more unlikely that the issue is caused by two > processes fighting over that particular spinlock. > > The patch is very effective at reducing overall cost of taking snapshots > and so I think it needs to be applied to allow more performance data to > be collected. I don't suppose this is the last performance tuning we > will need to do. OK, fair enough. I withdraw my objections. ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 25 Apr 2010 11:50
Robert Haas <robertmhaas(a)gmail.com> writes: > On Sat, Apr 24, 2010 at 5:17 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote: > Both Heikki and I objected to that patch. >> >> Please explain your objection, based upon the patch and my explanations. > Well, we objected to the locking. Not only is the locking overly complex, but it's outright wrong; or at least the comments are. KnownAssignedXidsAdd in particular has a comment that is 100% misleading, and an actual behavior that I find extremely likely to provoke bugs (eg, consider caller calling it twice under the illusion it stlll has only shared lock). I'm not even convinced that it works correctly, since it will happily write into KnownAssignedXids[] while holding only shared ProcArrayLock. What happens when two processes are doing that concurrently? This needs a redesign before it can be considered committable. I don't really care whether it makes things faster; it's too complicated and too poorly documented to be maintainable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |