Bug: Buffer cache is not scan resistant [PgSql]

Prev: xlogViewer / xlogdump
Next: CVS corruption/mistagging?

From: Tom Lane on 12 Mar 2007 10:30

ITAGAKI Takahiro <itagaki.takahiro(a)oss.ntt.co.jp> writes:
> I tested your patch with VACUUM FREEZE. The performance was improved when
> I set scan_recycle_buffers > 32. I used VACUUM FREEZE to increase WAL traffic,
> but this patch should be useful for normal VACUUMs with backgrond jobs!

Proving that you can see a different in a worst-case scenario is not the
same as proving that the patch is useful in normal cases.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

From: "Simon Riggs" on 12 Mar 2007 14:40

On Mon, 2007-03-12 at 10:30 -0400, Tom Lane wrote:
> ITAGAKI Takahiro <itagaki.takahiro(a)oss.ntt.co.jp> writes:
> > I tested your patch with VACUUM FREEZE. The performance was improved when
> > I set scan_recycle_buffers > 32. I used VACUUM FREEZE to increase WAL traffic,
> > but this patch should be useful for normal VACUUMs with backgrond jobs!
>
> Proving that you can see a different in a worst-case scenario is not the
> same as proving that the patch is useful in normal cases.

I agree, but I think that this VACUUM FREEZE test does actually
represent the normal case, here's why:

The poor buffer manager behaviour occurs if the block is dirty as a
result of WAL-logged changes. It only takes the removal of a single row
for us to have WAL logged this and dirtied the block.

If we invoke VACUUM from autovacuum, we do this by default when 20% of
the rows have been updated, which means with many distributions of
UPDATEs we'll have touched a very large proportion of blocks before we
VACUUM. That isn't true for *all* distributions of UPDATEs, but it will
soon be. Dead Space Map will deliver only dirty blocks for us.

So running a VACUUM FREEZE is a reasonable simulation of running a large
VACUUM on a real production system with default autovacuum enabled, as
will be the case for 8.3.

It is possible that we run VACUUM when fewer dirty blocks are generated,
but this won't be the common situation and not something we should
optimise for.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo(a)postgresql.org so that your
message can get through to the mailing list cleanly

From: ITAGAKI Takahiro on 13 Mar 2007 00:40

"Simon Riggs" <simon(a)2ndquadrant.com> wrote:

> > > With the default
> > > value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers in pool,
> > > just like existing sequential scans. Is this intended?
> >
> New test version enclosed, where scan_recycle_buffers = 0 doesn't change
> existing VACUUM behaviour.

This is a result with scan_recycle_buffers.v3.patch. I used normal VACUUM
with background load using slowdown-ed pgbench in this instance. I believe
the patch is useful in normal cases, not only for VACUUM FREEZE.

N | time | WAL flush(*)
-----+--------+-----------
0 | 112.8s | 44.3%
1 | 148.9s | 52.1%
8 | 105.1s | 17.6%
16 | 96.9s | 8.7%
32 | 103.9s | 6.3%
64 | 89.4s | 6.6%
128 | 80.0s | 3.8%

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

From: "Luke Lonergan" on 13 Mar 2007 01:16

Simon,

You may know we've built something similar and have seen similar gains.
We're planning a modification that I think you should consider: when there
is a sequential scan of a table larger than the size of shared_buffers, we
are allowing the scan to write through the shared_buffers cache.

The hypothesis is that if a relation is of a size equal to or less than the
size of shared_buffers, it is "cacheable" and should use the standard LRU
approach to provide for reuse.

- Luke

On 3/12/07 3:08 AM, "Simon Riggs" <simon(a)2ndquadrant.com> wrote:

> On Mon, 2007-03-12 at 09:14 +0000, Simon Riggs wrote:
>> On Mon, 2007-03-12 at 16:21 +0900, ITAGAKI Takahiro wrote:
>
>>> With the default
>>> value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers in
>>> pool,
>>> just like existing sequential scans. Is this intended?
>>
>> Yes, but its not very useful for testing to have done that. I'll do
>> another version within the hour that sets N=0 (only) back to current
>> behaviour for VACUUM.
>
> New test version enclosed, where scan_recycle_buffers = 0 doesn't change
> existing VACUUM behaviour.

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

From: "Simon Riggs" on 13 Mar 2007 05:37

On Mon, 2007-03-12 at 22:16 -0700, Luke Lonergan wrote:

> You may know we've built something similar and have seen similar gains.

Cool

> We're planning a modification that I think you should consider: when there
> is a sequential scan of a table larger than the size of shared_buffers, we
> are allowing the scan to write through the shared_buffers cache.

Write? For which operations?

I was thinking to do this for bulk writes also, but it would require
changes to bgwriter's cleaning sequence. Are you saying to write say ~32
buffers then fsync them, rather than letting bgwriter do that? Then
allow those buffers to be reused?

> The hypothesis is that if a relation is of a size equal to or less than the
> size of shared_buffers, it is "cacheable" and should use the standard LRU
> approach to provide for reuse.

Sounds reasonable. Please say more.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

First | Prev | Next | Last
Pages: 8 9 10 11 12 13 14 15 16 17 18 19
Prev: xlogViewer / xlogdump
Next: CVS corruption/mistagging?