Prev: [HACKERS] is any reason why we cannot cast from record (row) to typed row?
Next: [HACKERS] 8.5 vs. 9.0
From: Takahiro Itagaki on 21 Jan 2010 02:52 KaiGai Kohei <kaigai(a)ak.jp.nec.com> wrote: > This patch renamed the hasBlobs() by getBlobs(), and changed its > purpose. It registers DO_BLOBS, DO_BLOB_COMMENTS and DO_BLOB_ACLS > for each large objects owners, if necessary. This patch adds DumpableObjectType DO_BLOB_ACLS and struct BlobsInfo. We use three BlobsInfo objects for DO_BLOBS, DO_BLOB_COMMENTS, and DO_BLOB_ACLS _for each distinct owners_ of large objects. So, even if we have many large objects in the database, we just keep at most (3 * num-of-roles) BlobsInfo in memory. For older versions of server, we assume that blobs are owned by only one user with an empty name. We have no BlobsInfo if no large objects. I'm not sure whether we need to make groups for each owner of large objects. If I remember right, the primary issue was separating routines for dump BLOB ACLS from routines for BLOB COMMENTS, right? Why did you make the change? Another concern is their confusable identifier names -- getBlobs() returns BlobsInfo for each owners. Could we rename them something like getBlobOwners() and BlobOwnerInfo? Also, DumpableObject.name is not used in BlobsInfo. We could reuse DumpableObject.name instead of the "rolname" field in BlobsInfo. Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 22 Jan 2010 15:12 KaiGai Kohei <kaigai(a)ak.jp.nec.com> writes: > The attached patch is a revised version. I'm inclined to wonder whether this patch doesn't prove that we've reached the end of the line for the current representation of blobs in pg_dump archives. The alternative that I'm thinking about is to treat each blob as an independent object (hence, with its own TOC entry). If we did that, then the standard pg_dump mechanisms for ownership, ACLs, and comments would apply, and we could get rid of the messy hacks that this patch is just adding to. That would also open the door to future improvements like being able to selectively restore blobs. (Actually you could do it immediately if you didn't mind editing a -l file...) And it would for instance allow loading of blobs to be parallelized. Now the argument against that is that it won't scale terribly well to situations with very large numbers of blobs. However, I'm not convinced that the current approach of cramming them all into one TOC entry scales so well either. If your large objects are actually large, there's not going to be an enormous number of them. We've heard of people with many tens of thousands of tables, and pg_dump speed didn't seem to be a huge bottleneck for them (at least not in recent versions). So I'm feeling we should not dismiss the idea of one TOC entry per blob. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Kevin Grittner" on 22 Jan 2010 15:46 Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > Now the argument against that is that it won't scale terribly well > to situations with very large numbers of blobs. However, I'm not > convinced that the current approach of cramming them all into one > TOC entry scales so well either. If your large objects are > actually large, there's not going to be an enormous number of > them. We've heard of people with many tens of thousands of > tables, and pg_dump speed didn't seem to be a huge bottleneck for > them (at least not in recent versions). So I'm feeling we should > not dismiss the idea of one TOC entry per blob. > > Thoughts? We've got a "DocImage" table with about 7 million rows storing PDF documents in a bytea column, approaching 1 TB of data. (We don't want to give up ACID guarantees, replication, etc. by storing them on the file system with filenames in the database.) This works pretty well, except that client software occasionally has a tendency to run out of RAM. The interface could arguably be cleaner if we used BLOBs, but the security issues have precluded that in PostgreSQL. I suspect that 7 million BLOBs (and growing fast) would be a problem for this approach. Of course, if we're atypical, we could stay with bytea if this changed. Just a data point. -Kevin cir=> select count(*) from "DocImage"; count --------- 6891626 (1 row) cir=> select pg_size_pretty(pg_total_relation_size('"DocImage"')); pg_size_pretty ---------------- 956 GB (1 row) -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 22 Jan 2010 15:55 "Kevin Grittner" <Kevin.Grittner(a)wicourts.gov> writes: > Tom Lane <tgl(a)sss.pgh.pa.us> wrote: >> We've heard of people with many tens of thousands of >> tables, and pg_dump speed didn't seem to be a huge bottleneck for >> them (at least not in recent versions). So I'm feeling we should >> not dismiss the idea of one TOC entry per blob. >> >> Thoughts? > I suspect that 7 million BLOBs (and growing fast) would be a problem > for this approach. Of course, if we're atypical, we could stay with > bytea if this changed. Just a data point. Do you have the opportunity to try an experiment on hardware similar to what you're running that on? Create a database with 7 million tables and see what the dump/restore times are like, and whether pg_dump/pg_restore appear to be CPU-bound or memory-limited when doing it. If they aren't, we could conclude that millions of TOC entries isn't a problem. A compromise we could consider is some sort of sub-TOC-entry scheme that gets the per-BLOB entries out of the main speed bottlenecks, while still letting us share most of the logic. For instance, I suspect that the first bottleneck in pg_dump would be the dependency sorting, but we don't really need to sort all the blobs individually for that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Kevin Grittner" on 22 Jan 2010 16:02
Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > Do you have the opportunity to try an experiment on hardware > similar to what you're running that on? Create a database with 7 > million tables and see what the dump/restore times are like, and > whether pg_dump/pg_restore appear to be CPU-bound or > memory-limited when doing it. If these can be empty (or nearly empty) tables, I can probably swing it as a background task. You didn't need to match the current 1.3 TB database size I assume? > If they aren't, we could conclude that millions of TOC entries > isn't a problem. I'd actually be rather more concerned about the effects on normal query plan times, or are you confident that won't be an issue? > A compromise we could consider is some sort of sub-TOC-entry > scheme that gets the per-BLOB entries out of the main speed > bottlenecks, while still letting us share most of the logic. For > instance, I suspect that the first bottleneck in pg_dump would be > the dependency sorting, but we don't really need to sort all the > blobs individually for that. That might also address the plan time issue, if it actually exists. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |