From: Marko Kreen on
On 4/23/10, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Marko Kreen <markokr(a)gmail.com> writes:
> > Um, you have been burned by exactly this on x86 also:
> > http://archives.postgresql.org/pgsql-hackers/2009-03/msg01265.php
>
>
> Yeah, we never did figure out exactly how come you were observing that
> failure on Intel-ish hardware. I was under the impression that Intel
> machines didn't have weak-memory-ordering behavior.
>
> I wonder whether your compiler had rearranged the code in ProcArrayAdd
> so that the increment happened before the array element store at the
> machine-code level. I think it would be entitled to do that under
> standard C semantics, since that ProcArrayStruct pointer isn't marked
> volatile.

Sounds likely.

Which seems to hint its better to handle all processors as weak ordered
and then work with explicit locks/memory barriers, than to sprinkle
code with 'volatile' to supress optimizations on intel and then still
fail on non-intel.

--
marko

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Sat, Apr 24, 2010 at 5:17 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote:
>> Both Heikki and I objected to that patch.
>
> Please explain your objection, based upon the patch and my explanations.

Well, we objected to the locking. Having reread the patch a few times
though, I think I'm starting to wrap my head around it so, I don't
know, maybe it's OK. Have you tested grabbing the ProcArrayLock in
exclusive mode instead of having a separate spinlock, to see how that
performs?

>> And apparently it doesn't
>> fix the problem, either.  So, -1 from me.
>
> There is an issue observed in Erik's later tests, but my interpretation
> of the results so far is that the sorted array patch successfully
> removes the initially reported loss of performance.

Is it possible the remaining spikes are due to fights over the spinlock?

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Simon Riggs on
On Sun, 2010-04-25 at 06:53 -0400, Robert Haas wrote:
> On Sat, Apr 24, 2010 at 5:17 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote:
> >> Both Heikki and I objected to that patch.
> >
> > Please explain your objection, based upon the patch and my explanations.
>
> Well, we objected to the locking. Having reread the patch a few times
> though, I think I'm starting to wrap my head around it so, I don't
> know, maybe it's OK. Have you tested grabbing the ProcArrayLock in
> exclusive mode instead of having a separate spinlock, to see how that
> performs?

They both perform well. Heikki and I agree that spinlocks perform well.

The point of taking the ProcArrayLock in Shared rather than Exclusive
mode is the reduction in contention that gives. Adding a
KnownAssignedXid is exactly analogous to starting a transaction. We go
to some trouble to avoid taking a lock when we start a transaction and I
want to do a similar thing on the standby. I think it's even more
important we reduce contention on the standby than it is on the primary
because if the Startup process is delayed then it slows down recovery.

> >> And apparently it doesn't
> >> fix the problem, either. So, -1 from me.
> >
> > There is an issue observed in Erik's later tests, but my interpretation
> > of the results so far is that the sorted array patch successfully
> > removes the initially reported loss of performance.
>
> Is it possible the remaining spikes are due to fights over the spinlock?

That one specific spinlock? I considered it. I think its doubtful.

The spikes all involved a jump from <5ms to much more than that, often
as long as a second. That looks like a pure I/O problem or an LWlock
held across a slow I/O or platform specific issues of some kind.

Erik's tests were with 1 backend, so that would mean that at most 2
processes might want the spinlock you mention. The spinlock is only
taken at snapshot time, so once per transaction in Erik's tests. So the
if the spinlock were the problem then one or other of the processes
would need to grab and hold the spinlock for a very long time and then
release it sometime later. Since the spinlock code has been with us for
some time, I doubt there is anything wrong there.

I don't have any evidence as to whether the problem is solely on Erik's
system, nor whether it is an issue already in the codebase or introduced
by this patch. It seems unlikely to me that this patch introduces the
problem, and even more unlikely that the issue is caused by two
processes fighting over that particular spinlock.

The patch is very effective at reducing overall cost of taking snapshots
and so I think it needs to be applied to allow more performance data to
be collected. I don't suppose this is the last performance tuning we
will need to do.

--
Simon Riggs www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Sun, Apr 25, 2010 at 8:50 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote:
> On Sun, 2010-04-25 at 06:53 -0400, Robert Haas wrote:
>> On Sat, Apr 24, 2010 at 5:17 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote:
>> >> Both Heikki and I objected to that patch.
>> >
>> > Please explain your objection, based upon the patch and my explanations.
>>
>> Well, we objected to the locking.  Having reread the patch a few times
>> though, I think I'm starting to wrap my head around it so, I don't
>> know, maybe it's OK.  Have you tested grabbing the ProcArrayLock in
>> exclusive mode instead of having a separate spinlock, to see how that
>> performs?
>
> They both perform well. Heikki and I agree that spinlocks perform well.
>
> The point of taking the ProcArrayLock in Shared rather than Exclusive
> mode is the reduction in contention that gives. Adding a
> KnownAssignedXid is exactly analogous to starting a transaction. We go
> to some trouble to avoid taking a lock when we start a transaction and I
> want to do a similar thing on the standby. I think it's even more
> important we reduce contention on the standby than it is on the primary
> because if the Startup process is delayed then it slows down recovery.
>
>> >> And apparently it doesn't
>> >> fix the problem, either.  So, -1 from me.
>> >
>> > There is an issue observed in Erik's later tests, but my interpretation
>> > of the results so far is that the sorted array patch successfully
>> > removes the initially reported loss of performance.
>>
>> Is it possible the remaining spikes are due to fights over the spinlock?
>
> That one specific spinlock? I considered it. I think its doubtful.
>
> The spikes all involved a jump from <5ms to much more than that, often
> as long as a second. That looks like a pure I/O problem or an LWlock
> held across a slow I/O or platform specific issues of some kind.
>
> Erik's tests were with 1 backend, so that would mean that at most 2
> processes might want the spinlock you mention. The spinlock is only
> taken at snapshot time, so once per transaction in Erik's tests. So the
> if the spinlock were the problem then one or other of the processes
> would need to grab and hold the spinlock for a very long time and then
> release it sometime later. Since the spinlock code has been with us for
> some time, I doubt there is anything wrong there.
>
> I don't have any evidence as to whether the problem is solely on Erik's
> system, nor whether it is an issue already in the codebase or introduced
> by this patch. It seems unlikely to me that this patch introduces the
> problem, and even more unlikely that the issue is caused by two
> processes fighting over that particular spinlock.
>
> The patch is very effective at reducing overall cost of taking snapshots
> and so I think it needs to be applied to allow more performance data to
> be collected. I don't suppose this is the last performance tuning we
> will need to do.

OK, fair enough. I withdraw my objections.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Robert Haas <robertmhaas(a)gmail.com> writes:
> On Sat, Apr 24, 2010 at 5:17 AM, Simon Riggs <simon(a)2ndquadrant.com> wrote:
> Both Heikki and I objected to that patch.
>>
>> Please explain your objection, based upon the patch and my explanations.

> Well, we objected to the locking.

Not only is the locking overly complex, but it's outright wrong;
or at least the comments are. KnownAssignedXidsAdd in particular
has a comment that is 100% misleading, and an actual behavior that
I find extremely likely to provoke bugs (eg, consider caller calling
it twice under the illusion it stlll has only shared lock). I'm not
even convinced that it works correctly, since it will happily write
into KnownAssignedXids[] while holding only shared ProcArrayLock.
What happens when two processes are doing that concurrently?

This needs a redesign before it can be considered committable. I don't
really care whether it makes things faster; it's too complicated and too
poorly documented to be maintainable.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers