Bug: Buffer cache is not scan resistant [PgSql]

Prev: xlogViewer / xlogdump
Next: CVS corruption/mistagging?

From: Gavin Sherry on 5 Mar 2007 00:16

On Mon, 5 Mar 2007, Mark Kirkwood wrote:

> To add a little to this - forgetting the scan resistant point for the
> moment... cranking down shared_buffers to be smaller than the L2 cache
> seems to help *any* sequential scan immensely, even on quite modest HW:
>
> e.g: PIII 1.26Ghz 512Kb L2 cache, 2G ram,
>
> SELECT count(*) FROM lineitem (which is about 11GB) performance:
>
> Shared_buffers Elapsed
> -------------- -------
> 400MB 101 s
> 128KB 74 s
>
> When I've profiled this activity, I've seen a lot of time spent
> searching for/allocating a new buffer for each page being fetched.
> Obviously having less of them to search through will help, but having
> less than the L2 cache-size worth of 'em seems to help a whole lot!

Could you demonstrate that point by showing us timings for shared_buffers
sizes from 512K up to, say, 2 MB? The two numbers you give there might
just have to do with managing a large buffer.

Thanks,

Gavin

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

From: "Luke Lonergan" on 5 Mar 2007 00:52

Gavin, Mark,

> Could you demonstrate that point by showing us timings for
> shared_buffers sizes from 512K up to, say, 2 MB? The two
> numbers you give there might just have to do with managing a
> large buffer.

I suggest two experiments that we've already done:
1) increase shared buffers to double the L2 cache size, you should see
that the behavior reverts to the "slow" performance and is constant at
larger sizes

2) instrument the calls to BufferGetPage() (a macro) and note that the
buffer block numbers returned increase sequentially during scans of
tables larger than the buffer size

- Luke

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

From: Tom Lane on 5 Mar 2007 01:25

Gavin Sherry <swm(a)alcove.com.au> writes:
> Could you demonstrate that point by showing us timings for shared_buffers
> sizes from 512K up to, say, 2 MB? The two numbers you give there might
> just have to do with managing a large buffer.

Using PG CVS HEAD on 64-bit Intel Xeon (1MB L2 cache), Fedora Core 5,
I don't measure any noticeable difference in seqscan speed for
shared_buffers set to 32MB or 256kB. I note that the code would
not let me choose the latter setting without a large decrease in
max_connections, which might be expected to cause some performance
changes in itself.

Now this may only prove that the disk subsystem on this machine is
too cheap to let the system show any CPU-related issues. I'm seeing
a scan rate of about 43MB/sec for both count(*) and plain ol' "wc",
which is a factor of 4 or so less than Mark's numbers suggest...
but "top" shows CPU usage of less than 5%, so even with a 4x faster
disk I'd not really expect that CPU speed would become interesting.

(This is indeed a milestone, btw, because it wasn't so long ago that
count(*) was nowhere near disk speed.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

From: Tom Lane on 5 Mar 2007 03:24

Grzegorz Jaskiewicz <gj(a)pointblue.com.pl> writes:
> On Mar 5, 2007, at 2:36 AM, Tom Lane wrote:
>> I'm also less than convinced that it'd be helpful for a big seqscan:
>> won't reading a new disk page into memory via DMA cause that memory to
>> get flushed from the processor cache anyway?

> Nope. DMA is writing directly into main memory. If the area was in
> the L2/L1 cache, it will get invalidated. But if it isn't there, it
> is okay.

So either way, it isn't in processor cache after the read. So how can
there be any performance benefit?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

From: "Luke Lonergan" on 5 Mar 2007 03:28

> So either way, it isn't in processor cache after the read.
> So how can there be any performance benefit?

It's the copy from kernel IO cache to the buffer cache that is L2
sensitive. When the shared buffer cache is polluted, it thrashes the L2
cache. When the number of pages being written to in the kernel->user
space writes fits in L2, then the L2 lines are "written through" (see
the link below on page 264 for the write combining features of the
opteron for example) and the writes to main memory are deferred.

http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/
25112.PDF

- Luke

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: xlogViewer / xlogdump
Next: CVS corruption/mistagging?