Prev: [HACKERS] [PATCH 1/4] Add "COPY ... TO FUNCTION ..." support
Next: Backup history file should be replicated in Streaming Replication?
From: Daniel Farina on 26 Nov 2009 00:49 On Wed, Nov 25, 2009 at 9:35 PM, Andrew Dunstan <andrew(a)dunslane.net> wrote: > On Wed, November 25, 2009 3:56 pm, Jeff Davis wrote: >> >> I worry that we're getting further away from the original problem. Let's >> allow functions to get the bytes of data from a COPY, like the original >> proposal. I am not sure COPY is the best mechanism to move records >> around when INSERT ... SELECT already does that. >> > > > I am not at all sure I think that's a good idea, though. We have > pg_read_file() for getting raw bytes from files. Building that into COPY > does not strike me as a good fit. I think we speak of the opposite direction... fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Daniel Farina on 26 Nov 2009 21:30 On Thu, Nov 26, 2009 at 6:13 PM, David Fetter <david(a)fetter.org> wrote: > It'd be nice to make this available to the next revision of DBI-Link > and it'll be pretty handy for our SQL/MED whenever that happens. Okay, so this thread sort of wandered into how we could refactor other elements of COPY. Do we have a good sense on what we should do to the current patch (or at least the idea represented by it) to get it into a committable state within finite time? I think adding a bytea and/or text mode is once such improvement...I am still reluctant to give up on INTERNAL because the string buffer passed in the INTERNAL scenario is ideal for C programmers -- the interface is even simpler than dealing with varlena types. But I agree that auxiliary modes should exist to enable easier hacking. The thorniest issue in my mind is how state can be initialized retained and/or modified between calls to the bytestream-acceptance function. Arguably it is already in a state where it is no worse than dblink, which itself has a global hash table to manage state. Also, if you look carefully at the dblink test suite I submitted, you'll see an interesting trick: one can COPY from multiple sources consecutively to a single COPY on a remote node when in text mode (binary mode has a header that cannot be so neatly catenated). This is something that's pretty hard to enable with any automatic startup-work-cleanup approach. fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Daniel Farina on 29 Nov 2009 21:53 On Sun, Nov 29, 2009 at 6:35 PM, Jeff Davis <pgsql(a)j-davis.com> wrote: > What if the network buffer is flushed in the middle of a line? Is that > possible, or is there a guard against that somewhere? What do you mean? They both catenate onto one stream of bytes, it shouldn't matter where the flush boundaries are... It so happens as a convenient property of the textual modes is that adding more payload is purely concatenative (not true for binary, where there's a header that would cause confusion to the receiving side) fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Daniel Farina on 5 Dec 2009 03:32 On Mon, Nov 30, 2009 at 12:14 PM, Greg Smith <greg(a)2ndquadrant.com> wrote: > Jeff Davis wrote: > > COPY target FROM FUNCTION foo() WITH RECORDS; > > > In what format would the records be? As a not-quite aside, I think I have a better syntax suggestion. I have recently been digging around in the grammar with the changes made in the following commit: commit a6833fbb85cb5212a9d8085849e7281807f732a6 Author: Tom Lane <tgl(a)sss.pgh.pa.us> Date: Mon Sep 21 20:10:21 2009 +0000 Define a new, more extensible syntax for COPY options. This is intentionally similar to the recently revised syntax for EXPLAIN options, ie, (name value, ...). The old syntax is still supported for backwards compatibility, but we intend that any options added in future will be provided only in the new syntax. Robert Haas, Emmanuel Cecchet As it turns out, the following syntax may work pretty well: COPY y TO FUNCTION (setup_function the_setup_function('some arg', 3, 7, 42)) Basically the func_expr reduction fits very neatly into the copy_generic_opt_elem reduction: copy_generic_opt_elem: ColLabel copy_generic_opt_arg { $$ = (Node *) makeDefElem($1, $2); } | ColLabel func_expr { $$ = (Node *) $2; } ; Now we can use more or less any reasonable number of symbol names and function calls we desire. This makes life considerably easier, I think... We can also try to refactor COPY's internals to take advantage of these features (and potentially reduce the number of mechanisms. For example, the legacy "COPY ... TO '/place' WITH CSV" perhaps can be more verbosely/generically expressed as: COPY ... TO FUNCTION (setup_function to_file('/place'), record_converter csv_formatter, stream_function fwrite end_function fclose); We can also add auxiliary symbols for error handling behavior. For example, were the COPY to fail for some reason maybe it would make sense "on_error" to call "unlink" to clean up the partially finished file. I also have what I think is a somewhat interesting hack. Consider some of the functions up there without arguments (presumably they'd be called with a somewhat fixed contract the mechanics of COPY itself): how does one disambiguate them? Ideally, one could sometimes use literal arguments (when the return value of that function is desired to be threaded through the other specified functions) and other times it'd be nice to disambiguate functions via type names. That would look something like the following: COPY ... TO FUNCTION (setup_function to_file('/place'), record_converter csv_formatter(record), stream_function fwrite(bytea), end_function fclose(text)); I think this is possible to implement without much ambiguity, drawing on the observation that the COPY statement does not have -- and probably will never have -- references via Var(iable) node, unlike normal SQL statements such as SELECT, INSERT, et al. That means we might be able disambiguate using the following rules when scanning the funcexpr's arguments during the semantic analysis phase to figure out what to do: * Only literal list items found: it's a function call with the types of those literals. Ex: my_setup_function('foo'::text, 3) * Only non-literal list items found: it's type specifiers. Ex: csv_formatter(record). * Both literal and non-literal values found: report an error. This only works because no cases where a non-literal quantity could be confused with a type name come to mind. If one could name a type "3" and being forced to double-quote "3" to get your type disambiguated was just too ugly, then we are at an impasse. But otherwise I think this may work quite well. Common constellations of functions could perhaps be bound together into a DDL to reduce the amount of symbol soup going on here, but that seems like a pretty clean transition strategy at some later time. Most of the functionality could still be captured with this simple approach for now... Also note that factoring out high-performance implementations of things like csv_formatter (and friends: pg_binary_formatter) will probably take some time, but ultimately I think all the existing functionality could be realized as a layer of syntactic sugar over this mechanism. fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 29 Dec 2009 20:57
On Sat, Dec 5, 2009 at 3:32 AM, Daniel Farina <drfarina(a)acm.org> wrote: > On Mon, Nov 30, 2009 at 12:14 PM, Greg Smith <greg(a)2ndquadrant.com> wrote: >> Jeff Davis wrote: >> >> COPY target FROM FUNCTION foo() WITH RECORDS; >> >> >> In what format would the records be? > > As a not-quite aside, I think I have a better syntax suggestion. I > have recently been digging around in the grammar with the changes made > in the following commit: > > commit a6833fbb85cb5212a9d8085849e7281807f732a6 > Author: Tom Lane <tgl(a)sss.pgh.pa.us> > Date: Mon Sep 21 20:10:21 2009 +0000 > > Define a new, more extensible syntax for COPY options. > > This is intentionally similar to the recently revised syntax for EXPLAIN > options, ie, (name value, ...). The old syntax is still supported for > backwards compatibility, but we intend that any options added in future > will be provided only in the new syntax. > > Robert Haas, Emmanuel Cecchet > > As it turns out, the following syntax may work pretty well: > > COPY y TO FUNCTION (setup_function the_setup_function('some arg', 3, 7, 42)) > > Basically the func_expr reduction fits very neatly into the > copy_generic_opt_elem reduction: > > copy_generic_opt_elem: > ColLabel copy_generic_opt_arg > { > $$ = (Node *) makeDefElem($1, $2); > } > | ColLabel func_expr > { > $$ = (Node *) $2; > } > ; > > Now we can use more or less any reasonable number of symbol names and > function calls we desire. This makes life considerably easier, I > think... > > We can also try to refactor COPY's internals to take advantage of > these features (and potentially reduce the number of mechanisms. For > example, the legacy "COPY ... TO '/place' WITH CSV" perhaps can be > more verbosely/generically expressed as: > > COPY ... TO FUNCTION (setup_function to_file('/place'), > record_converter csv_formatter, > stream_function fwrite > end_function fclose); > > We can also add auxiliary symbols for error handling behavior. For > example, were the COPY to fail for some reason maybe it would make > sense "on_error" to call "unlink" to clean up the partially finished > file. > > I also have what I think is a somewhat interesting hack. Consider > some of the functions up there without arguments (presumably they'd be > called with a somewhat fixed contract the mechanics of COPY itself): > how does one disambiguate them? Ideally, one could sometimes use > literal arguments (when the return value of that function is desired > to be threaded through the other specified functions) and other times > it'd be nice to disambiguate functions via type names. That would > look something like the following: > > COPY ... TO FUNCTION (setup_function to_file('/place'), > record_converter csv_formatter(record), > stream_function fwrite(bytea), > end_function fclose(text)); > > I think this is possible to implement without much ambiguity, drawing > on the observation that the COPY statement does not have -- and > probably will never have -- references via Var(iable) node, unlike > normal SQL statements such as SELECT, INSERT, et al. That means we > might be able disambiguate using the following rules when scanning the > funcexpr's arguments during the semantic analysis phase to figure out > what to do: > > * Only literal list items found: it's a function call with the types > of those literals. Ex: my_setup_function('foo'::text, 3) > > * Only non-literal list items found: it's type specifiers. Ex: > csv_formatter(record). > > * Both literal and non-literal values found: report an error. > > This only works because no cases where a non-literal quantity could be > confused with a type name come to mind. If one could name a type "3" > and being forced to double-quote "3" to get your type disambiguated > was just too ugly, then we are at an impasse. But otherwise I think > this may work quite well. > > Common constellations of functions could perhaps be bound together > into a DDL to reduce the amount of symbol soup going on here, but that > seems like a pretty clean transition strategy at some later time. > Most of the functionality could still be captured with this simple > approach for now... > > Also note that factoring out high-performance implementations of > things like csv_formatter (and friends: pg_binary_formatter) will > probably take some time, but ultimately I think all the existing > functionality could be realized as a layer of syntactic sugar over > this mechanism. I am very fuzzy on where we stand with this patch. There's a link to the initial version on the commitfest application, but there has been so much discussion since then that I think there are probably some revisions to be made, though I can't say I know exactly what they are. I did also check the git branch, but it does not merge cleanly with the master. Is there a more recent version? Is this still under development? Or have we decided to go a different direction with it? Thanks, ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |