From: Daniel Farina on
On Wed, Nov 25, 2009 at 9:35 PM, Andrew Dunstan <andrew(a)dunslane.net> wrote:
> On Wed, November 25, 2009 3:56 pm, Jeff Davis wrote:
>>
>> I worry that we're getting further away from the original problem. Let's
>> allow functions to get the bytes of data from a COPY, like the original
>> proposal. I am not sure COPY is the best mechanism to move records
>> around when INSERT ... SELECT already does that.
>>
>
>
> I am not at all sure I think that's a good idea, though. We have
> pg_read_file() for getting raw bytes from files. Building that into COPY
> does not strike me as a good fit.

I think we speak of the opposite direction...

fdr

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Daniel Farina on
On Thu, Nov 26, 2009 at 6:13 PM, David Fetter <david(a)fetter.org> wrote:
> It'd be nice to make this available to the next revision of DBI-Link
> and it'll be pretty handy for our SQL/MED whenever that happens.

Okay, so this thread sort of wandered into how we could refactor other
elements of COPY. Do we have a good sense on what we should do to the
current patch (or at least the idea represented by it) to get it into
a committable state within finite time?

I think adding a bytea and/or text mode is once such improvement...I
am still reluctant to give up on INTERNAL because the string buffer
passed in the INTERNAL scenario is ideal for C programmers -- the
interface is even simpler than dealing with varlena types. But I
agree that auxiliary modes should exist to enable easier hacking.

The thorniest issue in my mind is how state can be initialized
retained and/or modified between calls to the bytestream-acceptance
function.

Arguably it is already in a state where it is no worse than dblink,
which itself has a global hash table to manage state.

Also, if you look carefully at the dblink test suite I submitted,
you'll see an interesting trick: one can COPY from multiple sources
consecutively to a single COPY on a remote node when in text mode
(binary mode has a header that cannot be so neatly catenated). This
is something that's pretty hard to enable with any automatic
startup-work-cleanup approach.

fdr

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Daniel Farina on
On Sun, Nov 29, 2009 at 6:35 PM, Jeff Davis <pgsql(a)j-davis.com> wrote:
> What if the network buffer is flushed in the middle of a line? Is that
> possible, or is there a guard against that somewhere?

What do you mean? They both catenate onto one stream of bytes, it
shouldn't matter where the flush boundaries are...

It so happens as a convenient property of the textual modes is that
adding more payload is purely concatenative (not true for binary,
where there's a header that would cause confusion to the receiving
side)

fdr

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Daniel Farina on
On Mon, Nov 30, 2009 at 12:14 PM, Greg Smith <greg(a)2ndquadrant.com> wrote:
> Jeff Davis wrote:
>
> COPY target FROM FUNCTION foo() WITH RECORDS;
>
>
> In what format would the records be?

As a not-quite aside, I think I have a better syntax suggestion. I
have recently been digging around in the grammar with the changes made
in the following commit:

commit a6833fbb85cb5212a9d8085849e7281807f732a6
Author: Tom Lane <tgl(a)sss.pgh.pa.us>
Date: Mon Sep 21 20:10:21 2009 +0000

Define a new, more extensible syntax for COPY options.

This is intentionally similar to the recently revised syntax for EXPLAIN
options, ie, (name value, ...). The old syntax is still supported for
backwards compatibility, but we intend that any options added in future
will be provided only in the new syntax.

Robert Haas, Emmanuel Cecchet

As it turns out, the following syntax may work pretty well:

COPY y TO FUNCTION (setup_function the_setup_function('some arg', 3, 7, 42))

Basically the func_expr reduction fits very neatly into the
copy_generic_opt_elem reduction:

copy_generic_opt_elem:
ColLabel copy_generic_opt_arg
{
$$ = (Node *) makeDefElem($1, $2);
}
| ColLabel func_expr
{
$$ = (Node *) $2;
}
;

Now we can use more or less any reasonable number of symbol names and
function calls we desire. This makes life considerably easier, I
think...

We can also try to refactor COPY's internals to take advantage of
these features (and potentially reduce the number of mechanisms. For
example, the legacy "COPY ... TO '/place' WITH CSV" perhaps can be
more verbosely/generically expressed as:

COPY ... TO FUNCTION (setup_function to_file('/place'),
record_converter csv_formatter,
stream_function fwrite
end_function fclose);

We can also add auxiliary symbols for error handling behavior. For
example, were the COPY to fail for some reason maybe it would make
sense "on_error" to call "unlink" to clean up the partially finished
file.

I also have what I think is a somewhat interesting hack. Consider
some of the functions up there without arguments (presumably they'd be
called with a somewhat fixed contract the mechanics of COPY itself):
how does one disambiguate them? Ideally, one could sometimes use
literal arguments (when the return value of that function is desired
to be threaded through the other specified functions) and other times
it'd be nice to disambiguate functions via type names. That would
look something like the following:

COPY ... TO FUNCTION (setup_function to_file('/place'),
record_converter csv_formatter(record),
stream_function fwrite(bytea),
end_function fclose(text));

I think this is possible to implement without much ambiguity, drawing
on the observation that the COPY statement does not have -- and
probably will never have -- references via Var(iable) node, unlike
normal SQL statements such as SELECT, INSERT, et al. That means we
might be able disambiguate using the following rules when scanning the
funcexpr's arguments during the semantic analysis phase to figure out
what to do:

* Only literal list items found: it's a function call with the types
of those literals. Ex: my_setup_function('foo'::text, 3)

* Only non-literal list items found: it's type specifiers. Ex:
csv_formatter(record).

* Both literal and non-literal values found: report an error.

This only works because no cases where a non-literal quantity could be
confused with a type name come to mind. If one could name a type "3"
and being forced to double-quote "3" to get your type disambiguated
was just too ugly, then we are at an impasse. But otherwise I think
this may work quite well.

Common constellations of functions could perhaps be bound together
into a DDL to reduce the amount of symbol soup going on here, but that
seems like a pretty clean transition strategy at some later time.
Most of the functionality could still be captured with this simple
approach for now...

Also note that factoring out high-performance implementations of
things like csv_formatter (and friends: pg_binary_formatter) will
probably take some time, but ultimately I think all the existing
functionality could be realized as a layer of syntactic sugar over
this mechanism.

fdr

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Sat, Dec 5, 2009 at 3:32 AM, Daniel Farina <drfarina(a)acm.org> wrote:
> On Mon, Nov 30, 2009 at 12:14 PM, Greg Smith <greg(a)2ndquadrant.com> wrote:
>> Jeff Davis wrote:
>>
>> COPY target FROM FUNCTION foo() WITH RECORDS;
>>
>>
>> In what format would the records be?
>
> As a not-quite aside, I think I have a better syntax suggestion.  I
> have recently been digging around in the grammar with the changes made
> in the following commit:
>
> commit a6833fbb85cb5212a9d8085849e7281807f732a6
> Author: Tom Lane <tgl(a)sss.pgh.pa.us>
> Date:   Mon Sep 21 20:10:21 2009 +0000
>
>    Define a new, more extensible syntax for COPY options.
>
>    This is intentionally similar to the recently revised syntax for EXPLAIN
>    options, ie, (name value, ...).  The old syntax is still supported for
>    backwards compatibility, but we intend that any options added in future
>    will be provided only in the new syntax.
>
>    Robert Haas, Emmanuel Cecchet
>
> As it turns out, the following syntax may work pretty well:
>
>  COPY y TO FUNCTION (setup_function the_setup_function('some arg', 3, 7, 42))
>
> Basically the func_expr reduction fits very neatly into the
> copy_generic_opt_elem reduction:
>
>    copy_generic_opt_elem:
>                            ColLabel copy_generic_opt_arg
>                                    {
>                                            $$ = (Node *) makeDefElem($1, $2);
>                                    }
>                            | ColLabel func_expr
>                                    {
>                                            $$ = (Node *) $2;
>                                    }
>                    ;
>
> Now we can use more or less any reasonable number of symbol names and
> function calls we desire.  This makes life considerably easier, I
> think...
>
> We can also try to refactor COPY's internals to take advantage of
> these features (and potentially reduce the number of mechanisms.  For
> example, the legacy "COPY ... TO '/place' WITH CSV" perhaps can be
> more verbosely/generically expressed as:
>
>  COPY ... TO FUNCTION (setup_function to_file('/place'),
>                        record_converter csv_formatter,
>                        stream_function fwrite
>                        end_function fclose);
>
> We can also add auxiliary symbols for error handling behavior.  For
> example, were the COPY to fail for some reason maybe it would make
> sense "on_error" to call "unlink" to clean up the partially finished
> file.
>
> I also have what I think is a somewhat interesting hack.  Consider
> some of the functions up there without arguments (presumably they'd be
> called with a somewhat fixed contract the mechanics of COPY itself):
> how does one disambiguate them?  Ideally, one could sometimes use
> literal arguments (when the return value of that function is desired
> to be threaded through the other specified functions) and other times
> it'd be nice to disambiguate functions via type names.  That would
> look something like the following:
>
>  COPY ... TO FUNCTION (setup_function to_file('/place'),
>                        record_converter csv_formatter(record),
>                        stream_function fwrite(bytea),
>                        end_function fclose(text));
>
> I think this is possible to implement without much ambiguity, drawing
> on the observation that the COPY statement does not have -- and
> probably will never have -- references via Var(iable) node, unlike
> normal SQL statements such as SELECT, INSERT, et al.  That means we
> might be able disambiguate using the following rules when scanning the
> funcexpr's arguments during the semantic analysis phase to figure out
> what to do:
>
>  * Only literal list items found: it's a function call with the types
>    of those literals.  Ex: my_setup_function('foo'::text, 3)
>
>  * Only non-literal list items found: it's type specifiers.  Ex:
>    csv_formatter(record).
>
>  * Both literal and non-literal values found: report an error.
>
> This only works because no cases where a non-literal quantity could be
> confused with a type name come to mind.  If one could name a type "3"
> and being forced to double-quote "3" to get your type disambiguated
> was just too ugly, then we are at an impasse.  But otherwise I think
> this may work quite well.
>
> Common constellations of functions could perhaps be bound together
> into a DDL to reduce the amount of symbol soup going on here, but that
> seems like a pretty clean transition strategy at some later time.
> Most of the functionality could still be captured with this simple
> approach for now...
>
> Also note that factoring out high-performance implementations of
> things like csv_formatter (and friends: pg_binary_formatter) will
> probably take some time, but ultimately I think all the existing
> functionality could be realized as a layer of syntactic sugar over
> this mechanism.

I am very fuzzy on where we stand with this patch. There's a link to
the initial version on the commitfest application, but there has been
so much discussion since then that I think there are probably some
revisions to be made, though I can't say I know exactly what they are.

I did also check the git branch, but it does not merge cleanly with the master.

Is there a more recent version? Is this still under development? Or
have we decided to go a different direction with it?

Thanks,

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers