From: Takahiro Itagaki on

KaiGai Kohei <kaigai(a)ak.jp.nec.com> wrote:

> This patch renamed the hasBlobs() by getBlobs(), and changed its
> purpose. It registers DO_BLOBS, DO_BLOB_COMMENTS and DO_BLOB_ACLS
> for each large objects owners, if necessary.

This patch adds DumpableObjectType DO_BLOB_ACLS and struct BlobsInfo. We
use three BlobsInfo objects for DO_BLOBS, DO_BLOB_COMMENTS, and DO_BLOB_ACLS
_for each distinct owners_ of large objects. So, even if we have many large
objects in the database, we just keep at most (3 * num-of-roles) BlobsInfo
in memory. For older versions of server, we assume that blobs are owned by
only one user with an empty name. We have no BlobsInfo if no large objects.


I'm not sure whether we need to make groups for each owner of large objects.
If I remember right, the primary issue was separating routines for dump
BLOB ACLS from routines for BLOB COMMENTS, right? Why did you make the change?


Another concern is their confusable identifier names -- getBlobs()
returns BlobsInfo for each owners. Could we rename them something
like getBlobOwners() and BlobOwnerInfo?

Also, DumpableObject.name is not used in BlobsInfo. We could reuse
DumpableObject.name instead of the "rolname" field in BlobsInfo.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
KaiGai Kohei <kaigai(a)ak.jp.nec.com> writes:
> The attached patch is a revised version.

I'm inclined to wonder whether this patch doesn't prove that we've
reached the end of the line for the current representation of blobs
in pg_dump archives. The alternative that I'm thinking about is to
treat each blob as an independent object (hence, with its own TOC
entry). If we did that, then the standard pg_dump mechanisms for
ownership, ACLs, and comments would apply, and we could get rid of
the messy hacks that this patch is just adding to. That would also
open the door to future improvements like being able to selectively
restore blobs. (Actually you could do it immediately if you didn't
mind editing a -l file...) And it would for instance allow loading
of blobs to be parallelized.

Now the argument against that is that it won't scale terribly well
to situations with very large numbers of blobs. However, I'm not
convinced that the current approach of cramming them all into one
TOC entry scales so well either. If your large objects are actually
large, there's not going to be an enormous number of them. We've
heard of people with many tens of thousands of tables, and pg_dump
speed didn't seem to be a huge bottleneck for them (at least not
in recent versions). So I'm feeling we should not dismiss the
idea of one TOC entry per blob.

Thoughts?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on
Tom Lane <tgl(a)sss.pgh.pa.us> wrote:

> Now the argument against that is that it won't scale terribly well
> to situations with very large numbers of blobs. However, I'm not
> convinced that the current approach of cramming them all into one
> TOC entry scales so well either. If your large objects are
> actually large, there's not going to be an enormous number of
> them. We've heard of people with many tens of thousands of
> tables, and pg_dump speed didn't seem to be a huge bottleneck for
> them (at least not in recent versions). So I'm feeling we should
> not dismiss the idea of one TOC entry per blob.
>
> Thoughts?

We've got a "DocImage" table with about 7 million rows storing PDF
documents in a bytea column, approaching 1 TB of data. (We don't
want to give up ACID guarantees, replication, etc. by storing them
on the file system with filenames in the database.) This works
pretty well, except that client software occasionally has a tendency
to run out of RAM. The interface could arguably be cleaner if we
used BLOBs, but the security issues have precluded that in
PostgreSQL.

I suspect that 7 million BLOBs (and growing fast) would be a problem
for this approach. Of course, if we're atypical, we could stay with
bytea if this changed. Just a data point.

-Kevin

cir=> select count(*) from "DocImage";
count
---------
6891626
(1 row)

cir=> select pg_size_pretty(pg_total_relation_size('"DocImage"'));
pg_size_pretty
----------------
956 GB
(1 row)


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
"Kevin Grittner" <Kevin.Grittner(a)wicourts.gov> writes:
> Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> We've heard of people with many tens of thousands of
>> tables, and pg_dump speed didn't seem to be a huge bottleneck for
>> them (at least not in recent versions). So I'm feeling we should
>> not dismiss the idea of one TOC entry per blob.
>>
>> Thoughts?

> I suspect that 7 million BLOBs (and growing fast) would be a problem
> for this approach. Of course, if we're atypical, we could stay with
> bytea if this changed. Just a data point.

Do you have the opportunity to try an experiment on hardware similar to
what you're running that on? Create a database with 7 million tables
and see what the dump/restore times are like, and whether
pg_dump/pg_restore appear to be CPU-bound or memory-limited when doing
it. If they aren't, we could conclude that millions of TOC entries
isn't a problem.

A compromise we could consider is some sort of sub-TOC-entry scheme that
gets the per-BLOB entries out of the main speed bottlenecks, while still
letting us share most of the logic. For instance, I suspect that the
first bottleneck in pg_dump would be the dependency sorting, but we
don't really need to sort all the blobs individually for that.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on
Tom Lane <tgl(a)sss.pgh.pa.us> wrote:

> Do you have the opportunity to try an experiment on hardware
> similar to what you're running that on? Create a database with 7
> million tables and see what the dump/restore times are like, and
> whether pg_dump/pg_restore appear to be CPU-bound or
> memory-limited when doing it.

If these can be empty (or nearly empty) tables, I can probably swing
it as a background task. You didn't need to match the current 1.3
TB database size I assume?

> If they aren't, we could conclude that millions of TOC entries
> isn't a problem.

I'd actually be rather more concerned about the effects on normal
query plan times, or are you confident that won't be an issue?

> A compromise we could consider is some sort of sub-TOC-entry
> scheme that gets the per-BLOB entries out of the main speed
> bottlenecks, while still letting us share most of the logic. For
> instance, I suspect that the first bottleneck in pg_dump would be
> the dependency sorting, but we don't really need to sort all the
> blobs individually for that.

That might also address the plan time issue, if it actually exists.

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers