From: Heikki Linnakangas on
Pavan Deolasee wrote:
> I have just counted the number of read/write calls on the CLOG blocks. As
> you can
> see the total number of CLOG reads jumped from 545323 to 1181851 i.e.
> 1181851 - 545323 = 636528 CLOG block reads for 1554697 pages of stock
> table.

Hmm. So there is some activity there. Could you modify the patch to
count how many of those reads came from OS cache? I'm thinking of doing
a gettimeofday() call before and after read, and counting how many
calls finished in less than say < 1 ms. Also, summing up the total time
spent in reads would be interesting.

Or, would it be possible to put the clog to a different drive, and use
iostat to get the numbers?

> This figure is only indicative since it also includes the CLOG block reads
> which would
> have happened as part of other backend operations (VACUUM took almost 55
> minutes to
> complete). Still in the first 210 minutes of the run, the total reads were
> only 545323. So
> most of the 636528 reads in the next 55 minutes can be attributed to
> VACUUM.

Actually, clog reads during normal activity is even worse.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

From: "Simon Riggs" on
On Wed, 2007-01-24 at 09:32 +0530, Pavan Deolasee wrote:

> On a typical desktop class 2 CPU Dell machine, we have seen pgbench
> clocking more than 1500 tps. That implies CLOG would get filled up in
> less
> than 262144/1500=174 seconds. VACUUM on accounts table takes much
> longer to trigger.

You assume that all of the top level transactions have no
subtransactions. On that test, subtransactions are in use because of the
EXCEPTION clause in the PL/pgSQL used. That should at least double the
number of Xids.

> So
> most of the 636528 reads in the next 55 minutes can be attributed to
> VACUUM.

A similar argument might also be applied to subtrans, so a similar
investigation seems worthwhile. Subtrans has space for less Xids than
clog, BTW.

OTOH, I do think that 99% of that will not cause I/O.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

From: "Pavan Deolasee" on
On 1/24/07, Heikki Linnakangas <heikki(a)enterprisedb.com> wrote:
>
> Pavan Deolasee wrote:
> > I have just counted the number of read/write calls on the CLOG blocks.
> As
> > you can
> > see the total number of CLOG reads jumped from 545323 to 1181851 i.e.
> > 1181851 - 545323 = 636528 CLOG block reads for 1554697 pages of stock
> > table.
>
> Hmm. So there is some activity there. Could you modify the patch to
> count how many of those reads came from OS cache? I'm thinking of doing
> a gettimeofday() call before and after read, and counting how many
> calls finished in less than say < 1 ms. Also, summing up the total time
> spent in reads would be interesting.


Here are some more numbers. I ran two tests of 4 hour each with CLOG cache
size set to 8 blocks (default) and 16 blocks. I counted the number of read()
calls
and specifically those read() calls which took more than 0.5 ms to complete.
As you guessed, almost 99% of the reads complete in less than 0.5 ms, but
the total read() time is still more than 1% of the duration of the test. Is
it
worth optimizing ?

CLOG (16 blocks)
reads(743317), writes(84), reads > 0.5 ms (5171), time reads (186s), time
reads > 0.5 ms(175s)

CLOG (8 blocks)
reads(1155917), writes(119), reads > 0.5 ms (4040), time reads (146s), time
reads > 0.5 ms(130s)

(amused to see increase in the total read time with 16 blocks)

Also is it worth optimizing on the total read() system calls which might not
cause physical I/O, but
still consume CPU ?

Thanks,
Pavan

EnterpriseDB http://www.enterprisedb.com
From: Heikki Linnakangas on
Pavan Deolasee wrote:
> On 1/24/07, Heikki Linnakangas <heikki(a)enterprisedb.com> wrote:
>> Hmm. So there is some activity there. Could you modify the patch to
>> count how many of those reads came from OS cache? I'm thinking of doing
>> a gettimeofday() call before and after read, and counting how many
>> calls finished in less than say < 1 ms. Also, summing up the total time
>> spent in reads would be interesting.
>
> Here are some more numbers. I ran two tests of 4 hour each with CLOG cache
> size set to 8 blocks (default) and 16 blocks. I counted the number of
> read()
> calls
> and specifically those read() calls which took more than 0.5 ms to
> complete.
> As you guessed, almost 99% of the reads complete in less than 0.5 ms, but
> the total read() time is still more than 1% of the duration of the test. Is
> it
> worth optimizing ?

Probably not. I wouldn't trust that 1% of test duration figure too much,
gettimeofday() has some overhead of its own...

> CLOG (16 blocks)
> reads(743317), writes(84), reads > 0.5 ms (5171), time reads (186s), time
> reads > 0.5 ms(175s)
>
> CLOG (8 blocks)
> reads(1155917), writes(119), reads > 0.5 ms (4040), time reads (146s), time
> reads > 0.5 ms(130s)
>
> (amused to see increase in the total read time with 16 blocks)

Hmm. That's surprising.

> Also is it worth optimizing on the total read() system calls which might
> not
> cause physical I/O, but
> still consume CPU ?

I don't think it's worth it, but now that we're talking about it: What
I'd like to do to all the slru files is to replace the custom buffer
management with mmapping the whole file, and letting the OS take care of
it. We would get rid of some guc variables, the OS would tune the amount
of memory used for clog/subtrans dynamically, and we would avoid the
memory copying. And I'd like to do the same for WAL.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo(a)postgresql.org so that your
message can get through to the mailing list cleanly

From: "Pavan Deolasee" on
On 1/25/07, Heikki Linnakangas <heikki(a)enterprisedb.com> wrote:
>
> Pavan Deolasee wrote:
> >
> > Also is it worth optimizing on the total read() system calls which might
> > not
> > cause physical I/O, but
> > still consume CPU ?
>
> I don't think it's worth it, but now that we're talking about it: What
> I'd like to do to all the slru files is to replace the custom buffer
> management with mmapping the whole file, and letting the OS take care of
> it. We would get rid of some guc variables, the OS would tune the amount
> of memory used for clog/subtrans dynamically, and we would avoid the
> memory copying. And I'd like to do the same for WAL.


Yes, we can do that. One problem though is mmaping wouldn't work when
CLOG file is extended and some of the backends may not see the extended
portion. But may be we can start with a sufficiently large initialized file
and
mmap the whole file.

Another simpler solution for VACUUM would be to read the entire CLOG file
in local memory. Most of the transaction status queries can be satisfied
from
this local copy and the normal CLOG is consulted only when the status is
unknown (TRANSACTION_STATUS_IN_PROGRESS)

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com