From: Daniel Farina on
On Tue, Dec 29, 2009 at 5:57 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
> I am very fuzzy on where we stand with this patch.  There's a link to
> the initial version on the commitfest application, but there has been
> so much discussion since then that I think there are probably some
> revisions to be made, though I can't say I know exactly what they are.
>
> I did also check the git branch, but it does not merge cleanly with the master.
>
> Is there a more recent version?  Is this still under development?  Or
> have we decided to go a different direction with it?

I think we've decided to go in a different direction for
implementation, and my last communication was to suggest a mechanism
that would allow for something more clean using the copy options
refactoring. I wouldn't even attempt to merge it unless there's big
outcry for the feature as-is, which I doubt there is. But perhaps
it's worthy TODO fodder.

The mechanics in the email you replied to probably need more feedback to act.

fdr

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Tue, Dec 29, 2009 at 9:23 PM, Daniel Farina <drfarina(a)acm.org> wrote:
> On Tue, Dec 29, 2009 at 5:57 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
>> I am very fuzzy on where we stand with this patch.  There's a link to
>> the initial version on the commitfest application, but there has been
>> so much discussion since then that I think there are probably some
>> revisions to be made, though I can't say I know exactly what they are.
>>
>> I did also check the git branch, but it does not merge cleanly with the master.
>>
>> Is there a more recent version?  Is this still under development?  Or
>> have we decided to go a different direction with it?
>
> I think we've decided to go in a different direction for
> implementation, and my last communication was to suggest a mechanism
> that would allow for something more clean using the copy options
> refactoring.  I wouldn't even attempt to merge it unless there's big
> outcry for the feature as-is, which I doubt there is.  But perhaps
> it's worthy TODO fodder.
>
> The mechanics in the email you replied to probably need more feedback to act.

I think there's clear support for a version of COPY that returns rows
like a SELECT statement, particularly for use with CTEs. There seems
to be support both for a mode that returns text[] or something like it
and also for a mode that returns a defined record type. But that all
seems separate from what you're proposing here, which is a
considerably lower-level facility which seems like it would not be of
much use to ordinary users, but might be of some value to tool
implementors - or perhaps you'd disagree with that characterization?

Anyway, my specific reaction to your suggestions in the email that I
quoted is that it seems a bit baroque and that I'm not really sure
what it's useful for in practice. I'm certainly not saying it ISN'T
useful, because I can't believe that you would have gone to the
trouble to work through all of this unless you had some ideas about
nifty things that could be done with it, but I think maybe we need to
back up and start by talking about the problems you're trying to
solve, before we get too far down into a discussion of implementation
details. It doesn't appear to me that's been discussed too much so
far, although there's enough enthusiasm here to make me suspect that
other people may understand it better than I do.

Based on your comments above, I'm going to go ahead and mark this
particular patch as Returned with Feedback, since it seems you don't
intend to continue with it and therefore it won't need to be reviewed
during the January CommitFest. But I'm interesting in continuing the
discussion and in any new patches that come out of it.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Daniel Farina on
On Tue, Dec 29, 2009 at 6:48 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
> I think there's clear support for a version of COPY that returns rows
> like a SELECT statement, particularly for use with CTEs.  There seems
> to be support both for a mode that returns text[] or something like it
> and also for a mode that returns a defined record type.  But that all
> seems separate from what you're proposing here, which is a
> considerably lower-level facility which seems like it would not be of
> much use to ordinary users, but might be of some value to tool
> implementors - or perhaps you'd disagree with that characterization?
>

This is in the other direction: freeing COPY from the restriction that
it can only put bytes into two places:

* A network socket (e.g. stdout)
* A file (as supseruser)

Instead, it can hand off bytes to an arbitrary UDF that can handle it
in any way. A clean design should be able to subsume at least the
existing simple behaviors, plus enabling more, as well as potentially
providing inspiration for how to decouple at least a few components of
COPY that perhaps can benefit the long-term cleanup effort there.

> Anyway, my specific reaction to your suggestions in the email that I
> quoted is that it seems a bit baroque and that I'm not really sure
> what it's useful for in practice.  I'm certainly not saying it ISN'T
> useful, because I can't believe that you would have gone to the
> trouble to work through all of this unless you had some ideas about
> nifty things that could be done with it, but I think maybe we need to
> back up and start by talking about the problems you're trying to
> solve, before we get too far down into a discussion of implementation
> details.

At Truviso this is used a piece of our replication solution. In the
patches submitted we see how enhancing dblink allows postgres to copy
directly from one node to another. Truviso uses it to directly write
bytes to a libpq connection (see the dblink patch) in the open COPY
state to achieve direct cross-node bulk loading for the purposes of
replication.

One could imagine a lot of ETL or data warehouse offloading
applications that can be enabled by allowing bytes to be handled by
arbitrary code, although this patch achieves nothing that writing some
middleware could not accomplish: it's just convenient to have and
likely more efficient than writing some application middleware to do
the same thing.

fdr

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Tue, Dec 29, 2009 at 9:56 PM, Daniel Farina <drfarina(a)acm.org> wrote:
> On Tue, Dec 29, 2009 at 6:48 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
>> I think there's clear support for a version of COPY that returns rows
>> like a SELECT statement, particularly for use with CTEs.  There seems
>> to be support both for a mode that returns text[] or something like it
>> and also for a mode that returns a defined record type.  But that all
>> seems separate from what you're proposing here, which is a
>> considerably lower-level facility which seems like it would not be of
>> much use to ordinary users, but might be of some value to tool
>> implementors - or perhaps you'd disagree with that characterization?
>>
>
> This is in the other direction: freeing COPY from the restriction that
> it can only put bytes into two places:
>
> * A network socket (e.g. stdout)
> * A file (as supseruser)

Oh, duh. Actually, that seems like a pretty solid idea.

I fear that to make this really useful we would need to define some
new SQL syntax, like:

CREATE [OR REPLACE] COPY TARGET name (STARTUP function_name, STREAM
function_name, SHUTDOWN function_name);
DROP COPY TARGET name;
GRANT USAGE ON COPY TARGET TO ...;

COPY ... TO/FROM TARGET name (generic_option_list) WITH (options);

We could define the startup function to get the parameter list as a
list of DefElems and return an internal structure of its own devising
which would then be passed to the stream and shutdown functions.

It might be possible to do this without introducing a notion of an
explicit object, but there are a couple of problems with that. First,
it requires the user to specify a lot of details on every COPY
invocation, which is awkward. Second, there's a security issue to
think about here. If we were just copying to a UDF that took a string
as an argument, that would be fine, but it's not safe to let
unprivileged users to directly invoke functions that take a
type-internal argument. Introducing an explicit object type would
allow the creation of copy targets to be restricted to super-users but
then granted out to whom the super-user chooses.

Thoughts?

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Daniel Farina on
On Tue, Dec 29, 2009 at 8:11 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
> It might be possible to do this without introducing a notion of an
> explicit object, but there are a couple of problems with that.  First,
> it requires the user to specify a lot of details on every COPY
> invocation, which is awkward.  Second, there's a security issue to
> think about here.  If we were just copying to a UDF that took a string
> as an argument, that would be fine, but it's not safe to let
> unprivileged users to directly invoke functions that take a
> type-internal argument.  Introducing an explicit object type would
> allow the creation of copy targets to be restricted to super-users but
> then granted out to whom the super-user chooses.
>
> Thoughts?

Agree on the type internal and superuser access -- indeed, if one were
to refactor the two existing COPY modes into external functions, the
stdout behavior would be marked with SECURITY DEFINER and the to-file
functions would only be superuser-accessible. (Interesting note: that
means copy.c could theoretically lose the special check for superuser
privilege for this mode, relying on standard function permissions...).

I was mostly satisfied with a byzantine but otherwise semantically
simple interface until the idea matures some more -- perhaps in
practice -- to inform the most useful kind of convenience to support
it. I don't see a strong reason personally to rush into defining such
an interface just yet, although an important interface we would have
to define is contract a user would have to follow to enable their very
own fully featured COPY output mode.

Still, the patch as-submitted is quite far from achieving one of my
main litmus test goals: subsumption of existing COPY behavior.
Particularly thorny was how to support the copying-to-a-file
semantics, but I believe that the copy-options patch provide a nice
avenue to solve this problem, as one can call a function in the
options list and somehow pass the return value of that initializer --
which may include a file handle -- to the byte-handling function.

Finally, I think a valid point was made that the patch is much more
powerful to end users if it supports byte arrays, and there are some
open questions as to whether this should be the only/primary supported
mode. I personally like the INTERNAL-type interface, as dealing with
the StringInfo buffer used by current COPY code is very convenient
from C and the current implementation is not very complicated yet
avoids unnecessary copying/value coercions, but I agree there is
definitely value in enabling the use of more normal data types.

fdr

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers