From: "Strong, David" on
Simon,

In the 16/16 (16 buffer partitions/16 lock partitions) test, the
WALInsertLock lock had 14643080 acquisition attempts and 12057678
successful acquisitions on the lock. That's 2585402 retries on the lock.
That is to say that PGSemaphoreLock was invoked 2585402 times.

In the 128/128 test, the WALInsertLock lock had 14991208 acquisition
attempts and 12324765 successful acquisitions. That's 2666443 retries.

The 128/128 test attempted 348128 more lock acquisitions than the 16/16
test and retried 81041 times more than the 16/16 test. We attribute the
rise in WALInsertLock lock accesses to the reduction in time on
acquiring the BufMapping and LockMgr partition locks. Does this seem
reasonable?

The overhead of any monitoring is of great concern to us. We've tried
both clock_gettime () and getttimeofday () calls. They both seem to have
the same overhead ~1 us/call (measured against the TSC of the CPU) and
both seem to be accurate. We realize this can be a delicate point and so
we would be happy to rerun any tests with a different timing mechanism.

David

-----Original Message-----
From: Simon Riggs [mailto:simon(a)2ndquadrant.com]
Sent: Wednesday, September 13, 2006 2:22 AM
To: Tom Lane
Cc: Strong, David; PostgreSQL-development
Subject: Re: [HACKERS] Lock partitions

On Tue, 2006-09-12 at 12:40 -0400, Tom Lane wrote:
> "Strong, David" <david.strong(a)unisys.com> writes:
> > When using 16 buffer and 16 lock partitions, we see that BufMapping
> > takes 809 seconds to acquire locks and 174 seconds to release locks.
The
> > LockMgr takes 362 seconds to acquire locks and 26 seconds to release
> > locks.
>
> > When using 128 buffer and 128 lock partitions, we see that
BufMapping
> > takes 277 seconds (532 seconds improvement) to acquire locks and 78
> > seconds (96 seconds improvement) to release locks. The LockMgr takes
235
> > seconds (127 seconds improvement) to acquire locks and 22 seconds (4
> > seconds improvement) to release locks.
>
> While I don't see any particular penalty to increasing
> NUM_BUFFER_PARTITIONS, increasing NUM_LOCK_PARTITIONS carries a very
> significant penalty (increasing PGPROC size as well as the work needed
> during LockReleaseAll, which is executed at every transaction end).
> I think 128 lock partitions is probably verging on the ridiculous
> ... particularly if your benchmark only involves touching half a dozen
> tables. I'd be more interested in comparisons between 4 and 16 lock
> partitions. Also, please vary the two settings independently rather
> than confusing the issue by changing them both at once.

Good thinking David. Even if 128 is fairly high, it does seem worth
exploring higher values - I was just stuck in "fewer == better"
thoughts.

> > With the improvements in the various locking times, one might expect
an
> > improvement in the overall benchmark result. However, a 16 partition
run
> > produces a result of 198.74 TPS and a 128 partition run produces a
> > result of 203.24 TPS.
>
> > Part of the time saved from BufMapping and LockMgr partitions is
> > absorbed into the WALInsertLock lock. For a 16 partition run, the
total
> > time to lock/release the WALInsertLock lock is 5845 seconds. For 128
> > partitions, the WALInsertLock lock takes 6172 seconds, an increase
of
> > 327 seconds. Perhaps we have our WAL configured incorrectly?
>
> I fear this throws your entire measurement procedure into question.
For
> a fixed workload the number of acquisitions of WALInsertLock ought to
be
> fixed, so you shouldn't see any more contention for WALInsertLock if
the
> transaction rate didn't change materially.

David's results were to do with lock acquire/release time, not the
number of acquisitions, so that in itself doesn't make me doubt these
measurements. Perhaps we can ask whether there was a substantially
different number of lock acquisitions? As Tom says, that would be an
issue.

It seems reasonable that relieving the bottleneck on BufMapping and
LockMgr locks that we would then queue longer on the next bottleneck,
WALInsertLock. So again, those tests seem reasonable to me so far.

These seem to be the beginnings of accurate wait time analysis, so I'm
listening closely.

Are you using a lightweight timer?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

From: "Strong, David" on
Tom,

We have some results for you. We left the buffer partition locks at 128
as this did not seem to be a concern and we're still using 25 backend
processes. We ran tests for 4, 8 and 16 lock partitions.

For 4 lock partitions, it took 620 seconds to acquire locks and 32
seconds to release locks. The test produced 199.95 TPS.

For 8 lock partitions, it took 505 seconds to acquire locks and 31
seconds to release locks. The test produced 201.16 TPS.

For 16 lock partitions, it took 362 seconds to acquire locks and 22
seconds to release locks. The test produced 200.75 TPS.

And, just for grins, using 128 buffer and 128 lock partitions, took 235
seconds to acquire locks and 22 seconds to release locks. The test
produced 203.24 TPS.

Let me know if we can provide any additional information from these
tests and if there are any other tests that we can run.

David

-----Original Message-----
From: pgsql-hackers-owner(a)postgresql.org
[mailto:pgsql-hackers-owner(a)postgresql.org] On Behalf Of Strong, David
Sent: Wednesday, September 13, 2006 10:52 AM
To: PostgreSQL-development
Subject: Re: [HACKERS] Lock partitions

Simon,

In the 16/16 (16 buffer partitions/16 lock partitions) test, the
WALInsertLock lock had 14643080 acquisition attempts and 12057678
successful acquisitions on the lock. That's 2585402 retries on the lock.
That is to say that PGSemaphoreLock was invoked 2585402 times.

In the 128/128 test, the WALInsertLock lock had 14991208 acquisition
attempts and 12324765 successful acquisitions. That's 2666443 retries.

The 128/128 test attempted 348128 more lock acquisitions than the 16/16
test and retried 81041 times more than the 16/16 test. We attribute the
rise in WALInsertLock lock accesses to the reduction in time on
acquiring the BufMapping and LockMgr partition locks. Does this seem
reasonable?

The overhead of any monitoring is of great concern to us. We've tried
both clock_gettime () and getttimeofday () calls. They both seem to have
the same overhead ~1 us/call (measured against the TSC of the CPU) and
both seem to be accurate. We realize this can be a delicate point and so
we would be happy to rerun any tests with a different timing mechanism.

David

-----Original Message-----
From: Simon Riggs [mailto:simon(a)2ndquadrant.com]
Sent: Wednesday, September 13, 2006 2:22 AM
To: Tom Lane
Cc: Strong, David; PostgreSQL-development
Subject: Re: [HACKERS] Lock partitions

On Tue, 2006-09-12 at 12:40 -0400, Tom Lane wrote:
> "Strong, David" <david.strong(a)unisys.com> writes:
> > When using 16 buffer and 16 lock partitions, we see that BufMapping
> > takes 809 seconds to acquire locks and 174 seconds to release locks.
The
> > LockMgr takes 362 seconds to acquire locks and 26 seconds to release
> > locks.
>
> > When using 128 buffer and 128 lock partitions, we see that
BufMapping
> > takes 277 seconds (532 seconds improvement) to acquire locks and 78
> > seconds (96 seconds improvement) to release locks. The LockMgr takes
235
> > seconds (127 seconds improvement) to acquire locks and 22 seconds (4
> > seconds improvement) to release locks.
>
> While I don't see any particular penalty to increasing
> NUM_BUFFER_PARTITIONS, increasing NUM_LOCK_PARTITIONS carries a very
> significant penalty (increasing PGPROC size as well as the work needed
> during LockReleaseAll, which is executed at every transaction end).
> I think 128 lock partitions is probably verging on the ridiculous
> ... particularly if your benchmark only involves touching half a dozen
> tables. I'd be more interested in comparisons between 4 and 16 lock
> partitions. Also, please vary the two settings independently rather
> than confusing the issue by changing them both at once.

Good thinking David. Even if 128 is fairly high, it does seem worth
exploring higher values - I was just stuck in "fewer == better"
thoughts.

> > With the improvements in the various locking times, one might expect
an
> > improvement in the overall benchmark result. However, a 16 partition
run
> > produces a result of 198.74 TPS and a 128 partition run produces a
> > result of 203.24 TPS.
>
> > Part of the time saved from BufMapping and LockMgr partitions is
> > absorbed into the WALInsertLock lock. For a 16 partition run, the
total
> > time to lock/release the WALInsertLock lock is 5845 seconds. For 128
> > partitions, the WALInsertLock lock takes 6172 seconds, an increase
of
> > 327 seconds. Perhaps we have our WAL configured incorrectly?
>
> I fear this throws your entire measurement procedure into question.
For
> a fixed workload the number of acquisitions of WALInsertLock ought to
be
> fixed, so you shouldn't see any more contention for WALInsertLock if
the
> transaction rate didn't change materially.

David's results were to do with lock acquire/release time, not the
number of acquisitions, so that in itself doesn't make me doubt these
measurements. Perhaps we can ask whether there was a substantially
different number of lock acquisitions? As Tom says, that would be an
issue.

It seems reasonable that relieving the bottleneck on BufMapping and
LockMgr locks that we would then queue longer on the next bottleneck,
WALInsertLock. So again, those tests seem reasonable to me so far.

These seem to be the beginnings of accurate wait time analysis, so I'm
listening closely.

Are you using a lightweight timer?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

From: Tom Lane on
"Strong, David" <david.strong(a)unisys.com> writes:
> We have some results for you. We left the buffer partition locks at 128
> as this did not seem to be a concern and we're still using 25 backend
> processes. We ran tests for 4, 8 and 16 lock partitions.

> For 4 lock partitions, it took 620 seconds to acquire locks and 32
> seconds to release locks. The test produced 199.95 TPS.

> For 8 lock partitions, it took 505 seconds to acquire locks and 31
> seconds to release locks. The test produced 201.16 TPS.

> For 16 lock partitions, it took 362 seconds to acquire locks and 22
> seconds to release locks. The test produced 200.75 TPS.

> And, just for grins, using 128 buffer and 128 lock partitions, took 235
> seconds to acquire locks and 22 seconds to release locks. The test
> produced 203.24 TPS.

[ itch... ] I can't help thinking there's something wrong with this;
the wait-time measurements seem sane, but why is there essentially no
change in the TPS result?

The above numbers are only for the lock-partition LWLocks, right?
What are the totals --- that is, how much time is spent blocked
vs. processing overall?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

From: Jim Nasby on
On Sep 13, 2006, at 2:46 PM, Strong, David wrote:
> We have some results for you. We left the buffer partition locks at
> 128
> as this did not seem to be a concern and we're still using 25 backend
> processes. We ran tests for 4, 8 and 16 lock partitions.

Isn't having more lock partitions than buffer partitions pointless?
--
Jim Nasby jim(a)nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

From: Tom Lane on
Jim Nasby <jim(a)nasby.net> writes:
> Isn't having more lock partitions than buffer partitions pointless?

AFAIK they're pretty orthogonal. It's true though that a typical
transaction doesn't hold all that many locks, which is why I don't
see a need for a large number of lock partitions.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq