From: Heikki Linnakangas on 24 Jan 2007 03:28 Pavan Deolasee wrote: > I have just counted the number of read/write calls on the CLOG blocks. As > you can > see the total number of CLOG reads jumped from 545323 to 1181851 i.e. > 1181851 - 545323 = 636528 CLOG block reads for 1554697 pages of stock > table. Hmm. So there is some activity there. Could you modify the patch to count how many of those reads came from OS cache? I'm thinking of doing a gettimeofday() call before and after read, and counting how many calls finished in less than say < 1 ms. Also, summing up the total time spent in reads would be interesting. Or, would it be possible to put the clog to a different drive, and use iostat to get the numbers? > This figure is only indicative since it also includes the CLOG block reads > which would > have happened as part of other backend operations (VACUUM took almost 55 > minutes to > complete). Still in the first 210 minutes of the run, the total reads were > only 545323. So > most of the 636528 reads in the next 55 minutes can be attributed to > VACUUM. Actually, clog reads during normal activity is even worse. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings
From: "Simon Riggs" on 24 Jan 2007 05:43 On Wed, 2007-01-24 at 09:32 +0530, Pavan Deolasee wrote: > On a typical desktop class 2 CPU Dell machine, we have seen pgbench > clocking more than 1500 tps. That implies CLOG would get filled up in > less > than 262144/1500=174 seconds. VACUUM on accounts table takes much > longer to trigger. You assume that all of the top level transactions have no subtransactions. On that test, subtransactions are in use because of the EXCEPTION clause in the PL/pgSQL used. That should at least double the number of Xids. > So > most of the 636528 reads in the next 55 minutes can be attributed to > VACUUM. A similar argument might also be applied to subtrans, so a similar investigation seems worthwhile. Subtrans has space for less Xids than clog, BTW. OTOH, I do think that 99% of that will not cause I/O. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster
From: "Pavan Deolasee" on 25 Jan 2007 04:01 On 1/24/07, Heikki Linnakangas <heikki(a)enterprisedb.com> wrote: > > Pavan Deolasee wrote: > > I have just counted the number of read/write calls on the CLOG blocks. > As > > you can > > see the total number of CLOG reads jumped from 545323 to 1181851 i.e. > > 1181851 - 545323 = 636528 CLOG block reads for 1554697 pages of stock > > table. > > Hmm. So there is some activity there. Could you modify the patch to > count how many of those reads came from OS cache? I'm thinking of doing > a gettimeofday() call before and after read, and counting how many > calls finished in less than say < 1 ms. Also, summing up the total time > spent in reads would be interesting. Here are some more numbers. I ran two tests of 4 hour each with CLOG cache size set to 8 blocks (default) and 16 blocks. I counted the number of read() calls and specifically those read() calls which took more than 0.5 ms to complete. As you guessed, almost 99% of the reads complete in less than 0.5 ms, but the total read() time is still more than 1% of the duration of the test. Is it worth optimizing ? CLOG (16 blocks) reads(743317), writes(84), reads > 0.5 ms (5171), time reads (186s), time reads > 0.5 ms(175s) CLOG (8 blocks) reads(1155917), writes(119), reads > 0.5 ms (4040), time reads (146s), time reads > 0.5 ms(130s) (amused to see increase in the total read time with 16 blocks) Also is it worth optimizing on the total read() system calls which might not cause physical I/O, but still consume CPU ? Thanks, Pavan EnterpriseDB http://www.enterprisedb.com
From: Heikki Linnakangas on 25 Jan 2007 05:21 Pavan Deolasee wrote: > On 1/24/07, Heikki Linnakangas <heikki(a)enterprisedb.com> wrote: >> Hmm. So there is some activity there. Could you modify the patch to >> count how many of those reads came from OS cache? I'm thinking of doing >> a gettimeofday() call before and after read, and counting how many >> calls finished in less than say < 1 ms. Also, summing up the total time >> spent in reads would be interesting. > > Here are some more numbers. I ran two tests of 4 hour each with CLOG cache > size set to 8 blocks (default) and 16 blocks. I counted the number of > read() > calls > and specifically those read() calls which took more than 0.5 ms to > complete. > As you guessed, almost 99% of the reads complete in less than 0.5 ms, but > the total read() time is still more than 1% of the duration of the test. Is > it > worth optimizing ? Probably not. I wouldn't trust that 1% of test duration figure too much, gettimeofday() has some overhead of its own... > CLOG (16 blocks) > reads(743317), writes(84), reads > 0.5 ms (5171), time reads (186s), time > reads > 0.5 ms(175s) > > CLOG (8 blocks) > reads(1155917), writes(119), reads > 0.5 ms (4040), time reads (146s), time > reads > 0.5 ms(130s) > > (amused to see increase in the total read time with 16 blocks) Hmm. That's surprising. > Also is it worth optimizing on the total read() system calls which might > not > cause physical I/O, but > still consume CPU ? I don't think it's worth it, but now that we're talking about it: What I'd like to do to all the slru files is to replace the custom buffer management with mmapping the whole file, and letting the OS take care of it. We would get rid of some guc variables, the OS would tune the amount of memory used for clog/subtrans dynamically, and we would avoid the memory copying. And I'd like to do the same for WAL. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo(a)postgresql.org so that your message can get through to the mailing list cleanly
From: "Pavan Deolasee" on 25 Jan 2007 06:17
On 1/25/07, Heikki Linnakangas <heikki(a)enterprisedb.com> wrote: > > Pavan Deolasee wrote: > > > > Also is it worth optimizing on the total read() system calls which might > > not > > cause physical I/O, but > > still consume CPU ? > > I don't think it's worth it, but now that we're talking about it: What > I'd like to do to all the slru files is to replace the custom buffer > management with mmapping the whole file, and letting the OS take care of > it. We would get rid of some guc variables, the OS would tune the amount > of memory used for clog/subtrans dynamically, and we would avoid the > memory copying. And I'd like to do the same for WAL. Yes, we can do that. One problem though is mmaping wouldn't work when CLOG file is extended and some of the backends may not see the extended portion. But may be we can start with a sufficiently large initialized file and mmap the whole file. Another simpler solution for VACUUM would be to read the entire CLOG file in local memory. Most of the transaction status queries can be satisfied from this local copy and the normal CLOG is consulted only when the status is unknown (TRANSACTION_STATUS_IN_PROGRESS) Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com |