Lock partitions [PgSql]

Prev: PANIC: could not locate a valid checkpoint record
Next: Replication documentation addition

From: Tom Lane on 12 Sep 2006 09:34

Simon Riggs <simon(a)2ndquadrant.com> writes:
> On Mon, 2006-09-11 at 11:29 -0400, Tom Lane wrote:
>> Great, thanks. The thing to twiddle is LOG2_NUM_LOCK_PARTITIONS in
>> src/include/storage/lwlock.h. You need a full backend recompile
>> after changing it, but you shouldn't need to initdb, if that helps.

> IIRC we did that already and the answer was 16...

No, no one has shown me any numbers from any "real" tests (anything
more than pgbench on a Dell PC ...).

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

From: "Strong, David" on 12 Sep 2006 11:46

We can pass on what we've seen when running tests here with different
BufMapping and LockMgr partition sizes.

We use a TPC-C inspired benchmark. Currently it is configured to run 25
backend processes. The test runs for 16 minutes as this is the minimum
amount of time we can run and obtain useful information. This gives us
24,000 seconds (25 * 16 * 60) of processing time.

The following timings have been rounded to the nearest second and
represent the amount of time amongst all backend processes to acquire
and release locks. For example, a value of 2500 seconds would mean each
backend process (25) took ~100 seconds to acquire or release a lock.
Although, in reality, the time spent locking or releasing each partition
entry is not uniform and there are some definite hotspot entries. We can
pass on some of the lock output if anyone is interested.

When using 16 buffer and 16 lock partitions, we see that BufMapping
takes 809 seconds to acquire locks and 174 seconds to release locks. The
LockMgr takes 362 seconds to acquire locks and 26 seconds to release
locks.

When using 128 buffer and 128 lock partitions, we see that BufMapping
takes 277 seconds (532 seconds improvement) to acquire locks and 78
seconds (96 seconds improvement) to release locks. The LockMgr takes 235
seconds (127 seconds improvement) to acquire locks and 22 seconds (4
seconds improvement) to release locks.

Overall, 128 BufMapping partitions improves locking/releasing by 678
seconds, 128 LockMgr partitions improves locking/releasing by 131
seconds.

With the improvements in the various locking times, one might expect an
improvement in the overall benchmark result. However, a 16 partition run
produces a result of 198.74 TPS and a 128 partition run produces a
result of 203.24 TPS.

Part of the time saved from BufMapping and LockMgr partitions is
absorbed into the WALInsertLock lock. For a 16 partition run, the total
time to lock/release the WALInsertLock lock is 5845 seconds. For 128
partitions, the WALInsertLock lock takes 6172 seconds, an increase of
327 seconds. Perhaps we have our WAL configured incorrectly?

Other static locks are also affected, but not as much as the
WALInsertLock lock. For example, the ProcArrayLock lock increases from
337 seconds to 348 seconds. The SInvalLock lock increases from 317
seconds to 331 seconds.

Due to expansion of time in other locks, a 128 partition run only spends
403 seconds less in locking than a 16 partition run.

We can generate some OProfile statistics, but most of the time saved is
probably absorbed into functions such as HeapTupleSatisfiesSnapshot and
PinBuffer which seem to have a very high overhead.

David

-----Original Message-----
From: pgsql-hackers-owner(a)postgresql.org
[mailto:pgsql-hackers-owner(a)postgresql.org] On Behalf Of Simon Riggs
Sent: Tuesday, September 12, 2006 1:37 AM
To: Tom Lane
Cc: Mark Wong; Bruce Momjian; PostgreSQL-development
Subject: Re: [HACKERS] Lock partitions

On Mon, 2006-09-11 at 11:29 -0400, Tom Lane wrote:
> Mark Wong <markw(a)osdl.org> writes:
> > Tom Lane wrote:
> >> It would be nice to see some results from the OSDL tests with, say,
4,
> >> 8, and 16 lock partitions before we forget about the point though.
> >> Anybody know whether OSDL is in a position to run tests for us?
>
> > Yeah, I can run some dbt2 tests in the lab. I'll get started on it.

> > We're still a little bit away from getting the automated testing for

> > PostgreSQL going again though.
>
> Great, thanks. The thing to twiddle is LOG2_NUM_LOCK_PARTITIONS in
> src/include/storage/lwlock.h. You need a full backend recompile
> after changing it, but you shouldn't need to initdb, if that helps.

IIRC we did that already and the answer was 16...

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo(a)postgresql.org so that your
message can get through to the mailing list cleanly

From: Tom Lane on 12 Sep 2006 12:40

"Strong, David" <david.strong(a)unisys.com> writes:
> When using 16 buffer and 16 lock partitions, we see that BufMapping
> takes 809 seconds to acquire locks and 174 seconds to release locks. The
> LockMgr takes 362 seconds to acquire locks and 26 seconds to release
> locks.

> When using 128 buffer and 128 lock partitions, we see that BufMapping
> takes 277 seconds (532 seconds improvement) to acquire locks and 78
> seconds (96 seconds improvement) to release locks. The LockMgr takes 235
> seconds (127 seconds improvement) to acquire locks and 22 seconds (4
> seconds improvement) to release locks.

While I don't see any particular penalty to increasing
NUM_BUFFER_PARTITIONS, increasing NUM_LOCK_PARTITIONS carries a very
significant penalty (increasing PGPROC size as well as the work needed
during LockReleaseAll, which is executed at every transaction end).
I think 128 lock partitions is probably verging on the ridiculous
.... particularly if your benchmark only involves touching half a dozen
tables. I'd be more interested in comparisons between 4 and 16 lock
partitions. Also, please vary the two settings independently rather
than confusing the issue by changing them both at once.

> With the improvements in the various locking times, one might expect an
> improvement in the overall benchmark result. However, a 16 partition run
> produces a result of 198.74 TPS and a 128 partition run produces a
> result of 203.24 TPS.

> Part of the time saved from BufMapping and LockMgr partitions is
> absorbed into the WALInsertLock lock. For a 16 partition run, the total
> time to lock/release the WALInsertLock lock is 5845 seconds. For 128
> partitions, the WALInsertLock lock takes 6172 seconds, an increase of
> 327 seconds. Perhaps we have our WAL configured incorrectly?

I fear this throws your entire measurement procedure into question. For
a fixed workload the number of acquisitions of WALInsertLock ought to be
fixed, so you shouldn't see any more contention for WALInsertLock if the
transaction rate didn't change materially.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

From: "Strong, David" on 12 Sep 2006 13:03

Tom,

Thanks for the feedback. We'll run a few tests with differing buffer and
lock partition sizes in the range you're interested in and we'll let you
know what we see.

Our workload is not fixed, however. Our benchmark does not follow the
strict TPC-C guideline of using think times etc. We throw as many
transactions at the database as we can. So, when any time is freed up,
we will fill it with another transaction. We simply want to stress as
much as we can. As one bottleneck is removed, the time saved obviously
flows to the next.

Postgres 8.2 moves some of the time that used to be consumed by single
BufMappingLock and LockMGRLock locks to the WALInsertLock lock. We have
run tests where we made XLogInsert a NOP, because we wanted to see where
the next bottleneck would be, and some of the time occupied by
WALInsertLock lock was absorbed by the SInvalLock lock. We have not
tried to remove the SInvalLock lock to see where time flows to next, but
we might.

David

-----Original Message-----
From: Tom Lane [mailto:tgl(a)sss.pgh.pa.us]
Sent: Tuesday, September 12, 2006 9:40 AM
To: Strong, David
Cc: PostgreSQL-development
Subject: Re: [HACKERS] Lock partitions

"Strong, David" <david.strong(a)unisys.com> writes:
> When using 16 buffer and 16 lock partitions, we see that BufMapping
> takes 809 seconds to acquire locks and 174 seconds to release locks.
The
> LockMgr takes 362 seconds to acquire locks and 26 seconds to release
> locks.

> When using 128 buffer and 128 lock partitions, we see that BufMapping
> takes 277 seconds (532 seconds improvement) to acquire locks and 78
> seconds (96 seconds improvement) to release locks. The LockMgr takes
235
> seconds (127 seconds improvement) to acquire locks and 22 seconds (4
> seconds improvement) to release locks.

While I don't see any particular penalty to increasing
NUM_BUFFER_PARTITIONS, increasing NUM_LOCK_PARTITIONS carries a very
significant penalty (increasing PGPROC size as well as the work needed
during LockReleaseAll, which is executed at every transaction end).
I think 128 lock partitions is probably verging on the ridiculous
.... particularly if your benchmark only involves touching half a dozen
tables. I'd be more interested in comparisons between 4 and 16 lock
partitions. Also, please vary the two settings independently rather
than confusing the issue by changing them both at once.

> With the improvements in the various locking times, one might expect
an
> improvement in the overall benchmark result. However, a 16 partition
run
> produces a result of 198.74 TPS and a 128 partition run produces a
> result of 203.24 TPS.

> Part of the time saved from BufMapping and LockMgr partitions is
> absorbed into the WALInsertLock lock. For a 16 partition run, the
total
> time to lock/release the WALInsertLock lock is 5845 seconds. For 128
> partitions, the WALInsertLock lock takes 6172 seconds, an increase of
> 327 seconds. Perhaps we have our WAL configured incorrectly?

I fear this throws your entire measurement procedure into question. For
a fixed workload the number of acquisitions of WALInsertLock ought to be
fixed, so you shouldn't see any more contention for WALInsertLock if the
transaction rate didn't change materially.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

From: Simon Riggs on 13 Sep 2006 05:22

On Tue, 2006-09-12 at 12:40 -0400, Tom Lane wrote:
> "Strong, David" <david.strong(a)unisys.com> writes:
> > When using 16 buffer and 16 lock partitions, we see that BufMapping
> > takes 809 seconds to acquire locks and 174 seconds to release locks. The
> > LockMgr takes 362 seconds to acquire locks and 26 seconds to release
> > locks.
>
> > When using 128 buffer and 128 lock partitions, we see that BufMapping
> > takes 277 seconds (532 seconds improvement) to acquire locks and 78
> > seconds (96 seconds improvement) to release locks. The LockMgr takes 235
> > seconds (127 seconds improvement) to acquire locks and 22 seconds (4
> > seconds improvement) to release locks.
>
> While I don't see any particular penalty to increasing
> NUM_BUFFER_PARTITIONS, increasing NUM_LOCK_PARTITIONS carries a very
> significant penalty (increasing PGPROC size as well as the work needed
> during LockReleaseAll, which is executed at every transaction end).
> I think 128 lock partitions is probably verging on the ridiculous
> ... particularly if your benchmark only involves touching half a dozen
> tables. I'd be more interested in comparisons between 4 and 16 lock
> partitions. Also, please vary the two settings independently rather
> than confusing the issue by changing them both at once.

Good thinking David. Even if 128 is fairly high, it does seem worth
exploring higher values - I was just stuck in "fewer == better"
thoughts.

> > With the improvements in the various locking times, one might expect an
> > improvement in the overall benchmark result. However, a 16 partition run
> > produces a result of 198.74 TPS and a 128 partition run produces a
> > result of 203.24 TPS.
>
> > Part of the time saved from BufMapping and LockMgr partitions is
> > absorbed into the WALInsertLock lock. For a 16 partition run, the total
> > time to lock/release the WALInsertLock lock is 5845 seconds. For 128
> > partitions, the WALInsertLock lock takes 6172 seconds, an increase of
> > 327 seconds. Perhaps we have our WAL configured incorrectly?
>
> I fear this throws your entire measurement procedure into question. For
> a fixed workload the number of acquisitions of WALInsertLock ought to be
> fixed, so you shouldn't see any more contention for WALInsertLock if the
> transaction rate didn't change materially.

David's results were to do with lock acquire/release time, not the
number of acquisitions, so that in itself doesn't make me doubt these
measurements. Perhaps we can ask whether there was a substantially
different number of lock acquisitions? As Tom says, that would be an
issue.

It seems reasonable that relieving the bottleneck on BufMapping and
LockMgr locks that we would then queue longer on the next bottleneck,
WALInsertLock. So again, those tests seem reasonable to me so far.

These seem to be the beginnings of accurate wait time analysis, so I'm
listening closely.

Are you using a lightweight timer?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8
Prev: PANIC: could not locate a valid checkpoint record
Next: Replication documentation addition