Prev: [HACKERS] is any reason why we cannot cast from record (row) to typed row?
Next: [HACKERS] 8.5 vs. 9.0
From: Tom Lane on 22 Jan 2010 16:05 "Kevin Grittner" <Kevin.Grittner(a)wicourts.gov> writes: > Tom Lane <tgl(a)sss.pgh.pa.us> wrote: >> Do you have the opportunity to try an experiment on hardware >> similar to what you're running that on? Create a database with 7 >> million tables and see what the dump/restore times are like, and >> whether pg_dump/pg_restore appear to be CPU-bound or >> memory-limited when doing it. > If these can be empty (or nearly empty) tables, I can probably swing > it as a background task. You didn't need to match the current 1.3 > TB database size I assume? Empty is fine. >> If they aren't, we could conclude that millions of TOC entries >> isn't a problem. > I'd actually be rather more concerned about the effects on normal > query plan times, or are you confident that won't be an issue? This is only a question of what happens internally in pg_dump and pg_restore --- I'm not suggesting we change anything on the database side. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Kevin Grittner" on 22 Jan 2010 16:13 Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > Empty is fine. I'll get started. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Kevin Grittner" on 22 Jan 2010 17:27 "Kevin Grittner" <Kevin.Grittner(a)wicourts.gov> wrote: > I'll get started. After a couple false starts, the creation of the millions of tables is underway. At the rate it's going, it won't finish for 8.2 hours, so I'll have to come in and test the dump tomorrow morning. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: KaiGai Kohei on 23 Jan 2010 02:39 (2010/01/23 5:12), Tom Lane wrote: > KaiGai Kohei<kaigai(a)ak.jp.nec.com> writes: >> The attached patch is a revised version. > > I'm inclined to wonder whether this patch doesn't prove that we've > reached the end of the line for the current representation of blobs > in pg_dump archives. The alternative that I'm thinking about is to > treat each blob as an independent object (hence, with its own TOC > entry). If we did that, then the standard pg_dump mechanisms for > ownership, ACLs, and comments would apply, and we could get rid of > the messy hacks that this patch is just adding to. That would also > open the door to future improvements like being able to selectively > restore blobs. (Actually you could do it immediately if you didn't > mind editing a -l file...) And it would for instance allow loading > of blobs to be parallelized. I also think it is better approach than the current blob representation. > Now the argument against that is that it won't scale terribly well > to situations with very large numbers of blobs. However, I'm not > convinced that the current approach of cramming them all into one > TOC entry scales so well either. If your large objects are actually > large, there's not going to be an enormous number of them. We've > heard of people with many tens of thousands of tables, and pg_dump > speed didn't seem to be a huge bottleneck for them (at least not > in recent versions). So I'm feeling we should not dismiss the > idea of one TOC entry per blob. Even if the database contains massive number of large objects, all the pg_dump has to manege on RAM is its metadata, not data contents. If we have one TOC entry per blob, the amount of total i/o size between server and pg_dump is not different from the current approach. If we assume one TOC entry consume 64 bytes of RAM, it needs 450MB of RAM for 7 million BLOBs. In the recent computers, is it unacceptable pain? If you try to dump TB class database, I'd like to assume the machine where pg_dump runs has adequate storage and RAM. Thanks, -- KaiGai Kohei <kaigai(a)kaigai.gr.jp> -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Kevin Grittner" on 23 Jan 2010 09:48
Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner(a)wicourts.gov> writes: >> Tom Lane <tgl(a)sss.pgh.pa.us> wrote: >>> Do you have the opportunity to try an experiment on hardware >>> similar to what you're running that on? Create a database with >>> 7 million tables and see what the dump/restore times are like, >>> and whether pg_dump/pg_restore appear to be CPU-bound or >>> memory-limited when doing it. > >> If these can be empty (or nearly empty) tables, I can probably >> swing it as a background task. You didn't need to match the >> current 1.3 TB database size I assume? > > Empty is fine. After about 15 hours of run time it was around 5.5 million tables; the rate of creation had slowed rather dramatically. I did create them with primary keys (out of habit) which was probably the wrong thing. I canceled the table creation process and started a VACUUM ANALYZE, figuring that we didn't want any hint-bit writing or bad statistics confusing the results. That has been running for 30 minutes with 65 MB to 140 MB per second disk activity, mixed read and write. After a few minutes that left me curious just how big the database was, so I tried: select pg_size_pretty(pg_database_size('test')); I did a Ctrl+C after about five minutes and got: Cancel request sent but it didn't return for 15 or 20 minutes. Any attempt to query pg_locks stalls. Tab completion stalls. (By the way, this is not related to the false alarm on that yesterday, which was a result of my attempting tab completion from within a failed transaction, which just found nothing rather than stalling.) So I'm not sure whether I can get to a state suitable for starting the desired test, but I'll stay with a for a while. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |