Prev: [HACKERS] [PATCH 1/4] Add "COPY ... TO FUNCTION ..." support
Next: Backup history file should be replicated in Streaming Replication?
From: Daniel Farina on 29 Dec 2009 21:23 On Tue, Dec 29, 2009 at 5:57 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: > I am very fuzzy on where we stand with this patch. There's a link to > the initial version on the commitfest application, but there has been > so much discussion since then that I think there are probably some > revisions to be made, though I can't say I know exactly what they are. > > I did also check the git branch, but it does not merge cleanly with the master. > > Is there a more recent version? Is this still under development? Or > have we decided to go a different direction with it? I think we've decided to go in a different direction for implementation, and my last communication was to suggest a mechanism that would allow for something more clean using the copy options refactoring. I wouldn't even attempt to merge it unless there's big outcry for the feature as-is, which I doubt there is. But perhaps it's worthy TODO fodder. The mechanics in the email you replied to probably need more feedback to act. fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 29 Dec 2009 21:48 On Tue, Dec 29, 2009 at 9:23 PM, Daniel Farina <drfarina(a)acm.org> wrote: > On Tue, Dec 29, 2009 at 5:57 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: >> I am very fuzzy on where we stand with this patch. There's a link to >> the initial version on the commitfest application, but there has been >> so much discussion since then that I think there are probably some >> revisions to be made, though I can't say I know exactly what they are. >> >> I did also check the git branch, but it does not merge cleanly with the master. >> >> Is there a more recent version? Is this still under development? Or >> have we decided to go a different direction with it? > > I think we've decided to go in a different direction for > implementation, and my last communication was to suggest a mechanism > that would allow for something more clean using the copy options > refactoring. I wouldn't even attempt to merge it unless there's big > outcry for the feature as-is, which I doubt there is. But perhaps > it's worthy TODO fodder. > > The mechanics in the email you replied to probably need more feedback to act. I think there's clear support for a version of COPY that returns rows like a SELECT statement, particularly for use with CTEs. There seems to be support both for a mode that returns text[] or something like it and also for a mode that returns a defined record type. But that all seems separate from what you're proposing here, which is a considerably lower-level facility which seems like it would not be of much use to ordinary users, but might be of some value to tool implementors - or perhaps you'd disagree with that characterization? Anyway, my specific reaction to your suggestions in the email that I quoted is that it seems a bit baroque and that I'm not really sure what it's useful for in practice. I'm certainly not saying it ISN'T useful, because I can't believe that you would have gone to the trouble to work through all of this unless you had some ideas about nifty things that could be done with it, but I think maybe we need to back up and start by talking about the problems you're trying to solve, before we get too far down into a discussion of implementation details. It doesn't appear to me that's been discussed too much so far, although there's enough enthusiasm here to make me suspect that other people may understand it better than I do. Based on your comments above, I'm going to go ahead and mark this particular patch as Returned with Feedback, since it seems you don't intend to continue with it and therefore it won't need to be reviewed during the January CommitFest. But I'm interesting in continuing the discussion and in any new patches that come out of it. ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Daniel Farina on 29 Dec 2009 21:56 On Tue, Dec 29, 2009 at 6:48 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: > I think there's clear support for a version of COPY that returns rows > like a SELECT statement, particularly for use with CTEs. There seems > to be support both for a mode that returns text[] or something like it > and also for a mode that returns a defined record type. But that all > seems separate from what you're proposing here, which is a > considerably lower-level facility which seems like it would not be of > much use to ordinary users, but might be of some value to tool > implementors - or perhaps you'd disagree with that characterization? > This is in the other direction: freeing COPY from the restriction that it can only put bytes into two places: * A network socket (e.g. stdout) * A file (as supseruser) Instead, it can hand off bytes to an arbitrary UDF that can handle it in any way. A clean design should be able to subsume at least the existing simple behaviors, plus enabling more, as well as potentially providing inspiration for how to decouple at least a few components of COPY that perhaps can benefit the long-term cleanup effort there. > Anyway, my specific reaction to your suggestions in the email that I > quoted is that it seems a bit baroque and that I'm not really sure > what it's useful for in practice. I'm certainly not saying it ISN'T > useful, because I can't believe that you would have gone to the > trouble to work through all of this unless you had some ideas about > nifty things that could be done with it, but I think maybe we need to > back up and start by talking about the problems you're trying to > solve, before we get too far down into a discussion of implementation > details. At Truviso this is used a piece of our replication solution. In the patches submitted we see how enhancing dblink allows postgres to copy directly from one node to another. Truviso uses it to directly write bytes to a libpq connection (see the dblink patch) in the open COPY state to achieve direct cross-node bulk loading for the purposes of replication. One could imagine a lot of ETL or data warehouse offloading applications that can be enabled by allowing bytes to be handled by arbitrary code, although this patch achieves nothing that writing some middleware could not accomplish: it's just convenient to have and likely more efficient than writing some application middleware to do the same thing. fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 29 Dec 2009 23:11 On Tue, Dec 29, 2009 at 9:56 PM, Daniel Farina <drfarina(a)acm.org> wrote: > On Tue, Dec 29, 2009 at 6:48 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: >> I think there's clear support for a version of COPY that returns rows >> like a SELECT statement, particularly for use with CTEs. There seems >> to be support both for a mode that returns text[] or something like it >> and also for a mode that returns a defined record type. But that all >> seems separate from what you're proposing here, which is a >> considerably lower-level facility which seems like it would not be of >> much use to ordinary users, but might be of some value to tool >> implementors - or perhaps you'd disagree with that characterization? >> > > This is in the other direction: freeing COPY from the restriction that > it can only put bytes into two places: > > * A network socket (e.g. stdout) > * A file (as supseruser) Oh, duh. Actually, that seems like a pretty solid idea. I fear that to make this really useful we would need to define some new SQL syntax, like: CREATE [OR REPLACE] COPY TARGET name (STARTUP function_name, STREAM function_name, SHUTDOWN function_name); DROP COPY TARGET name; GRANT USAGE ON COPY TARGET TO ...; COPY ... TO/FROM TARGET name (generic_option_list) WITH (options); We could define the startup function to get the parameter list as a list of DefElems and return an internal structure of its own devising which would then be passed to the stream and shutdown functions. It might be possible to do this without introducing a notion of an explicit object, but there are a couple of problems with that. First, it requires the user to specify a lot of details on every COPY invocation, which is awkward. Second, there's a security issue to think about here. If we were just copying to a UDF that took a string as an argument, that would be fine, but it's not safe to let unprivileged users to directly invoke functions that take a type-internal argument. Introducing an explicit object type would allow the creation of copy targets to be restricted to super-users but then granted out to whom the super-user chooses. Thoughts? ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Daniel Farina on 29 Dec 2009 23:47
On Tue, Dec 29, 2009 at 8:11 PM, Robert Haas <robertmhaas(a)gmail.com> wrote: > It might be possible to do this without introducing a notion of an > explicit object, but there are a couple of problems with that. First, > it requires the user to specify a lot of details on every COPY > invocation, which is awkward. Second, there's a security issue to > think about here. If we were just copying to a UDF that took a string > as an argument, that would be fine, but it's not safe to let > unprivileged users to directly invoke functions that take a > type-internal argument. Introducing an explicit object type would > allow the creation of copy targets to be restricted to super-users but > then granted out to whom the super-user chooses. > > Thoughts? Agree on the type internal and superuser access -- indeed, if one were to refactor the two existing COPY modes into external functions, the stdout behavior would be marked with SECURITY DEFINER and the to-file functions would only be superuser-accessible. (Interesting note: that means copy.c could theoretically lose the special check for superuser privilege for this mode, relying on standard function permissions...). I was mostly satisfied with a byzantine but otherwise semantically simple interface until the idea matures some more -- perhaps in practice -- to inform the most useful kind of convenience to support it. I don't see a strong reason personally to rush into defining such an interface just yet, although an important interface we would have to define is contract a user would have to follow to enable their very own fully featured COPY output mode. Still, the patch as-submitted is quite far from achieving one of my main litmus test goals: subsumption of existing COPY behavior. Particularly thorny was how to support the copying-to-a-file semantics, but I believe that the copy-options patch provide a nice avenue to solve this problem, as one can call a function in the options list and somehow pass the return value of that initializer -- which may include a file handle -- to the byte-handling function. Finally, I think a valid point was made that the patch is much more powerful to end users if it supports byte arrays, and there are some open questions as to whether this should be the only/primary supported mode. I personally like the INTERNAL-type interface, as dealing with the StringInfo buffer used by current COPY code is very convenient from C and the current implementation is not very complicated yet avoids unnecessary copying/value coercions, but I agree there is definitely value in enabling the use of more normal data types. fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |