Bug: Buffer cache is not scan resistant [PgSql]

Prev: xlogViewer / xlogdump
Next: CVS corruption/mistagging?

From: "Simon Riggs" on 9 Mar 2007 14:27

On Tue, 2007-03-06 at 22:32 -0500, Luke Lonergan wrote:
> Incidentally, we tried triggering NTA (L2 cache bypass)
> unconditionally and in various patterns and did not see the
> substantial gain as with reducing the working set size.
>
> My conclusion: Fixing the OS is not sufficient to alleviate the issue.
> We see a 2x penalty (1700MB/s versus 3500MB/s) at the higher data
> rates due to this effect.
>
I've implemented buffer recycling, as previously described, patch being
posted now to -patches as "scan_recycle_buffers".

This version includes buffer recycling

- for SeqScans larger than shared buffers, with the objective of
improving L2 cache efficiency *and* reducing the effects of shared
buffer cache spoiling (both as previously discussed on this thread)

- for VACUUMs of any size, with the objective of reducing WAL thrashing
whilst keeping VACUUM's behaviour of not spoiling the buffer cache (as
originally suggested by Itagaki-san, just with a different
implementation).

Behaviour is not activated by default in this patch. To request buffer
recycling, set the USERSET GUC
SET scan_recycle_buffers = N
tested with 1,4,8,16, but only > 8 seems sensible, IMHO.

Patch effects StrategyGetBuffer, so only effects the disk->cache path.
The idea is that if its already in shared buffer cache then we get
substantial benefit already and nothing else is needed. So for the
general case, the patch adds a single if test into the I/O path.

The parameter is picked up at the start of SeqScan and VACUUM
(currently). Any change mid-scan will be ignored.

IMHO its possible to do this and to allow Synch Scans at the same time,
with some thought. There is no need for us to rely on cache spoiling
behaviour of scans to implement that feature as well.

Independent performance tests requested, so that we can discuss this
objectively.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

From: "Luke Lonergan" on 9 Mar 2007 14:36

Cool!

- Luke

Msg is shrt cuz m on ma treo

-----Original Message-----
From: Simon Riggs [mailto:simon(a)2ndquadrant.com]
Sent: Friday, March 09, 2007 02:32 PM Eastern Standard Time
To: Luke Lonergan; ITAGAKI Takahiro
Cc: Sherry Moore; Tom Lane; Mark Kirkwood; Pavan Deolasee; Gavin Sherry; PGSQL Hackers; Doug Rady
Subject: Re: [HACKERS] Bug: Buffer cache is not scan resistant

On Tue, 2007-03-06 at 22:32 -0500, Luke Lonergan wrote:
> Incidentally, we tried triggering NTA (L2 cache bypass)
> unconditionally and in various patterns and did not see the
> substantial gain as with reducing the working set size.
>
> My conclusion: Fixing the OS is not sufficient to alleviate the issue.
> We see a 2x penalty (1700MB/s versus 3500MB/s) at the higher data
> rates due to this effect.
>
I've implemented buffer recycling, as previously described, patch being
posted now to -patches as "scan_recycle_buffers".

This version includes buffer recycling

- for SeqScans larger than shared buffers, with the objective of
improving L2 cache efficiency *and* reducing the effects of shared
buffer cache spoiling (both as previously discussed on this thread)

- for VACUUMs of any size, with the objective of reducing WAL thrashing
whilst keeping VACUUM's behaviour of not spoiling the buffer cache (as
originally suggested by Itagaki-san, just with a different
implementation).

Behaviour is not activated by default in this patch. To request buffer
recycling, set the USERSET GUC
SET scan_recycle_buffers = N
tested with 1,4,8,16, but only > 8 seems sensible, IMHO.

Patch effects StrategyGetBuffer, so only effects the disk->cache path.
The idea is that if its already in shared buffer cache then we get
substantial benefit already and nothing else is needed. So for the
general case, the patch adds a single if test into the I/O path.

The parameter is picked up at the start of SeqScan and VACUUM
(currently). Any change mid-scan will be ignored.

IMHO its possible to do this and to allow Synch Scans at the same time,
with some thought. There is no need for us to rely on cache spoiling
behaviour of scans to implement that feature as well.

Independent performance tests requested, so that we can discuss this
objectively.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From: ITAGAKI Takahiro on 12 Mar 2007 03:21

"Simon Riggs" <simon(a)2ndquadrant.com> wrote:

> I've implemented buffer recycling, as previously described, patch being
> posted now to -patches as "scan_recycle_buffers".
>
> - for VACUUMs of any size, with the objective of reducing WAL thrashing
> whilst keeping VACUUM's behaviour of not spoiling the buffer cache (as
> originally suggested by Itagaki-san, just with a different
> implementation).

I tested your patch with VACUUM FREEZE. The performance was improved when
I set scan_recycle_buffers > 32. I used VACUUM FREEZE to increase WAL traffic,
but this patch should be useful for normal VACUUMs with backgrond jobs!

N | time | WAL flush(*)
-----+-------+-----------
0 | 58.7s | 0.01%
1 | 80.3s | 81.76%
8 | 73.4s | 16.73%
16 | 64.2s | 9.24%
32 | 59.0s | 4.88%
64 | 56.7s | 2.63%
128 | 55.1s | 1.41%

(*) WAL flush is the ratio of the need of fsync to buffer recycle.

# SET scan_recycle_buffers = 0;
# UPDATE accounts SET aid=aid WHERE random() < 0.005;
# CHECKPOINT;
# SET scan_recycle_buffers = <N>;
# VACUUM FREEZE accounts;

BTW, does the patch change the default usage of buffer in vacuum? From what
I've seen, scan_recycle_buffers = 1 is the same as before. With the default
value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers in pool,
just like existing sequential scans. Is this intended?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

From: "Simon Riggs" on 12 Mar 2007 05:14

On Mon, 2007-03-12 at 16:21 +0900, ITAGAKI Takahiro wrote:
> "Simon Riggs" <simon(a)2ndquadrant.com> wrote:
>
> > I've implemented buffer recycling, as previously described, patch being
> > posted now to -patches as "scan_recycle_buffers".
> >
> > - for VACUUMs of any size, with the objective of reducing WAL thrashing
> > whilst keeping VACUUM's behaviour of not spoiling the buffer cache (as
> > originally suggested by Itagaki-san, just with a different
> > implementation).
>
> I tested your patch with VACUUM FREEZE. The performance was improved when
> I set scan_recycle_buffers > 32. I used VACUUM FREEZE to increase WAL traffic,
> but this patch should be useful for normal VACUUMs with backgrond jobs!

Thanks.

> N | time | WAL flush(*)
> -----+-------+-----------
> 0 | 58.7s | 0.01%
> 1 | 80.3s | 81.76%
> 8 | 73.4s | 16.73%
> 16 | 64.2s | 9.24%
> 32 | 59.0s | 4.88%
> 64 | 56.7s | 2.63%
> 128 | 55.1s | 1.41%
>
> (*) WAL flush is the ratio of the need of fsync to buffer recycle.

Do you have the same measurement without patch applied? I'd be
interested in the current state also (the N=0 path is modified as well
for VACUUM, in this patch).

> # SET scan_recycle_buffers = 0;
> # UPDATE accounts SET aid=aid WHERE random() < 0.005;
> # CHECKPOINT;
> # SET scan_recycle_buffers = <N>;
> # VACUUM FREEZE accounts;

Very good results, thanks. I'd be interested in the same thing for just
VACUUM and for varying ratios of dirty/clean blocks during vacuum.

> BTW, does the patch change the default usage of buffer in vacuum? From what
> I've seen, scan_recycle_buffers = 1 is the same as before.

That was the intention.

> With the default
> value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers in pool,
> just like existing sequential scans. Is this intended?

Yes, but its not very useful for testing to have done that. I'll do
another version within the hour that sets N=0 (only) back to current
behaviour for VACUUM.

One of the objectives of the patch was to prevent VACUUM from tripping
up other backends. I'm confident that it will improve that situation for
OLTP workloads running regular concurrent VACUUMs, but we will need to
wait a couple of days to get those results in also.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

From: "Simon Riggs" on 12 Mar 2007 06:08

On Mon, 2007-03-12 at 09:14 +0000, Simon Riggs wrote:
> On Mon, 2007-03-12 at 16:21 +0900, ITAGAKI Takahiro wrote:

> > With the default
> > value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers in pool,
> > just like existing sequential scans. Is this intended?
>
> Yes, but its not very useful for testing to have done that. I'll do
> another version within the hour that sets N=0 (only) back to current
> behaviour for VACUUM.

New test version enclosed, where scan_recycle_buffers = 0 doesn't change
existing VACUUM behaviour.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

First | Prev | Next | Last
Pages: 7 8 9 10 11 12 13 14 15 16 17 18 19
Prev: xlogViewer / xlogdump
Next: CVS corruption/mistagging?