[PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION [PgSql]

Prev: [HACKERS] [PATCH 1/4] Add "COPY ... TO FUNCTION ..." support
Next: Backup history file should be replicated in Streaming Replication?

From: Daniel Farina on 24 Nov 2009 05:51

On Tue, Nov 24, 2009 at 2:50 AM, Hannu Krosing <hannu(a)2ndquadrant.com> wrote:
> Can't you use existing aggregate function design ?
>
> CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
> SFUNC = sfunc,
> STYPE = state_data_type
> [ , FINALFUNC = ffunc ]
> [ , INITCOND = initial_condition ]
> [ , SORTOP = sort_operator ]
> )

Actually, yes. I just thought that this was an idea so crazy that no
one would like it.

fdr

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Daniel Farina on 24 Nov 2009 05:56

On Tue, Nov 24, 2009 at 2:51 AM, Daniel Farina <drfarina(a)gmail.com> wrote:
> On Tue, Nov 24, 2009 at 2:50 AM, Hannu Krosing <hannu(a)2ndquadrant.com> wrote:
>> Can't you use existing aggregate function design ?
>>
>> CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
>> SFUNC = sfunc,
>> STYPE = state_data_type
>> [ , FINALFUNC = ffunc ]
>> [ , INITCOND = initial_condition ]
>> [ , SORTOP = sort_operator ]
>> )
>
> Actually, yes. I just thought that this was an idea so crazy that no
> one would like it.

Oh, and the other elephant in the room: error handling. How to handle
error conditions...try/catch/finally type stuff. Aggregates do not
necessarily provide a slot for this one. I did consider using
aggregates though, but somehow it felt to me like "I need at least a
three-tuple, why not fish around for any random bundling of three
functions..."

After all, I would not want to actually call the nodeAgg stuff to
apply the function anyway...so it'd basically be abused as a
three-tuple of functions.

Also, what if you wanted, say, replace the mechanism for COPY TO
'file'? It'd be nice to make the following interaction (which uses
some implied global variables) not use such global variables:

BEGIN;
select open_file('/tmp/file', 'w+');
copy foo to function write_to_file;
-- what happens here if COPY aborts? Does the transaction being in
the error state mean that files will not get closed?
select close_file();
COMMIT;

fdr

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Daniel Farina on 24 Nov 2009 06:48

On Tue, Nov 24, 2009 at 3:25 AM, Hannu Krosing <hannu(a)2ndquadrant.com> wrote:
> On Tue, 2009-11-24 at 02:56 -0800, Daniel Farina wrote:
>> On Tue, Nov 24, 2009 at 2:51 AM, Daniel Farina <drfarina(a)gmail.com> wrote:
>> > On Tue, Nov 24, 2009 at 2:50 AM, Hannu Krosing <hannu(a)2ndquadrant.com> wrote:
>> >> Can't you use existing aggregate function design ?
>> >>
>> >> CREATE AGGREGATE name ( input_data_type [ , ... ] ) (
>> >> SFUNC = sfunc,
>> >> STYPE = state_data_type
>> >> [ , FINALFUNC = ffunc ]
>> >> [ , INITCOND = initial_condition ]
>> >> [ , SORTOP = sort_operator ]
>> >> )
>> >
>> > Actually, yes. I just thought that this was an idea so crazy that no
>> > one would like it.
>
> seems kind of natural choice for me - in essence this is an aggregate
> function, aggregating over rows/tuples supplied to it.

Okay, well, maybe that wasn't such a crazy idea after all...

>> Oh, and the other elephant in the room: error handling. How to handle
>> error conditions...try/catch/finally type stuff.
>
> Same as current aggregates - either ignore the error, logi it and
> continue, or bail out
>[snip]
> Neither do ordinary funtions, we have no "ON ERROR DO ..." clause for
> function definitions

It is assumed most functions do not have side effects outside the
database, so this is gotten rather for free. The driving use case for
this *is* side effects on other systems. I'm not sure if it's as easy
to use this justification here...normally rollbacks just take care of
all the error handling a function would want. Here I'm not so sure
that is as common a case.

>
>> I did consider using
>> aggregates though, but somehow it felt to me like "I need at least a
>> three-tuple, why not fish around for any random bundling of three
>> functions..."
>
> Why do you need three ?

I'm counting the aggregate prototype itself to refer to the bundle,
which I suppose would be more normally considered a two-tuple of
functions. This is a self-referential tuple, I suppose...

>> After all, I would not want to actually call the nodeAgg stuff to
>> apply the function anyway...so it'd basically be abused as a
>> three-tuple of functions.
>
> Actually it would be best if it could use straight generic funtions, so
> you could do something like
>
> COPY stdin TO filterfunc(int) TO avg(int);

Generic functions? Do you mean just scalar functions? That'd be
neat, but as I said previously, composition could just be wrapped into
a function of the user's choice. Also, what about use of
multi-function-apply?

COPY stdin TO replicant1(datum) AND replicant2(datum);

You could imagine all sorts of new 2PC evil. But again, one could
just write a little function to absorb the rows and dole them out
without bloating COPY syntax...

I am in no way suggesting that syntax seriously or unseriously.

> pass the file name in as an argument to SFUNC, open it on first call,
> ignore later (if it stays the same ;)

So either you are going to pass it with every row and ignore it, or
create a new initial aggregate state for each COPY TO FUNCTION...how
are you going to get it passed to SFUNC?

fdr

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Pavel Stehule on 24 Nov 2009 07:37

2009/11/24 Daniel Farina <drfarina(a)gmail.com>:
> On Tue, Nov 24, 2009 at 2:10 AM, Pavel Stehule <pavel.stehule(a)gmail.com> wrote:
>> Hello
>>
>> I thing, so this patch is maybe good idea. I am missing better
>> function specification. Specification by name isn't enough - we can
>> have a overloaded functions. This syntax doesn't allow to use explicit
>> cast - from my personal view, the syntax is ugly - with type
>> specification we don't need to keyword FUNCTION
>
> As long as things continue to support the INTERNAL-type behavior for
> extremely low overhead bulk transfers I am open to suggestions about
> how to enrich things...but how would I do so under this proposal?
>

using an INTERNAL type is wrong. It breaks design these functions for
usual PL. I don't see any reason, why it's necessary.

> I am especially fishing for suggestions in the direction of managing
> state for the function between rows though...I don't like how the
> current design seems to scream "use a global variable."
>
>> We have a fast copy statement - ok., we have a fast function ok, but
>> inside a function we have to call "slow" sql query. Personally What is
>> advantage?
>
> The implementation here uses a type 'internal' for performance. Â It
> doesn't even recompute the fcinfo because of the very particular
> circumstances of how the function is called. Â It doesn't do a memory
> copy of the argument buffer either, to the best of my knowledge. Â In
> the dblink patches you basically stream directly from the disk, format
> the COPY bytes, and shove it into a waiting COPY on another postgres
> node...there's almost no additional work in-between. Â All utilized
> time would be some combination of the normal COPY byte stream
> generation and libpq.
>

I understand and I dislike it. This design isn't general - or it is
far from using a function. It doesn't use complete FUNCAPI interface.
I thing so you need different semantic. You are not use a function.
You are use some like "stream object". This stream object can have a
input, output function, and parameters should be internal (I don't
thing, so internal could to carry any significant performance here) or
standard. Syntax should be similar to CREATE AGGREGATE.

then syntax should be:

COPY table TO streamname(parameters)

COPY table TO filestream('/tmp/foo.dta') ...
COPY table TO dblinkstream(connectionstring) ...

This design is only ideas. It's not important.

What is important - limited design. There are not possible to use PL
mainly untrusted PL. Using an internal type is simple hack.

Pavel

> This, of course, presumes that everyone who is interested in building
> on this is going to use some UDFs written in C...
>
>>
>> We need pipes like
>>
>> like COPY table TO foo(..) TO table
>>
>> foo() should be a transformation function, or real pipe function
>
> I've actually considered this pipe thing with a colleague while
> driving home from work...it occurred to us that it would be nice to
> have both pipes and tees (basically composition vs. mapping
> application of functions over the input) in some form. Â Not sure what
> an elegant way to express that is or how to control it. Â Since you can
> work around this by composing or applying functions on your own in
> another function, I'm not sure if that's as high priority for me
> personally.
>
> fdr
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Daniel Farina on 24 Nov 2009 08:00

On Tue, Nov 24, 2009 at 4:37 AM, Pavel Stehule <pavel.stehule(a)gmail.com> wrote:
> 2009/11/24 Daniel Farina <drfarina(a)gmail.com>:
>> On Tue, Nov 24, 2009 at 2:10 AM, Pavel Stehule <pavel.stehule(a)gmail.com> wrote:
>>> Hello
>>>
>>> I thing, so this patch is maybe good idea. I am missing better
>>> function specification. Specification by name isn't enough - we can
>>> have a overloaded functions. This syntax doesn't allow to use explicit
>>> cast - from my personal view, the syntax is ugly - with type
>>> specification we don't need to keyword FUNCTION
>>
>> As long as things continue to support the INTERNAL-type behavior for
>> extremely low overhead bulk transfers I am open to suggestions about
>> how to enrich things...but how would I do so under this proposal?
>>
>
> using an INTERNAL type is wrong. It breaks design these functions for
> usual PL. I don't see any reason, why it's necessary.
>
>> I am especially fishing for suggestions in the direction of managing
>> state for the function between rows though...I don't like how the
>> current design seems to scream "use a global variable."
>>
>>> We have a fast copy statement - ok., we have a fast function ok, but
>>> inside a function we have to call "slow" sql query. Personally What is
>>> advantage?
>>
>> The implementation here uses a type 'internal' for performance. It
>> doesn't even recompute the fcinfo because of the very particular
>> circumstances of how the function is called. It doesn't do a memory
>> copy of the argument buffer either, to the best of my knowledge. In
>> the dblink patches you basically stream directly from the disk, format
>> the COPY bytes, and shove it into a waiting COPY on another postgres
>> node...there's almost no additional work in-between. All utilized
>> time would be some combination of the normal COPY byte stream
>> generation and libpq.
>>
>
> I understand and I dislike it. This design isn't general - or it is
> far from using a function. It doesn't use complete FUNCAPI interface.
> I thing so you need different semantic. You are not use a function.
> You are use some like "stream object". This stream object can have a
> input, output function, and parameters should be internal (I don't
> thing, so internal could to carry any significant performance here) or
> standard. Syntax should be similar to CREATE AGGREGATE.

I think you might be right about this. At the time I was too shy to
add a DDL command for this hack, though. But what I did want is a
form of currying, and that's not easily accomplished in SQL without
extension...

> then syntax should be:
>
> COPY table TO streamname(parameters)
>
> COPY table TO filestream('/tmp/foo.dta') ...
> COPY table TO dblinkstream(connectionstring) ...

I like this one quite a bit...it's a bit like an aggregate, except the
initial condition can be set in a rather function-callish way.

But that does seem to require making a DDL command, which leaves a
nice green field. In particular, we could then make as many hooks,
flags, and options as we wanted, but sometimes there is a paradox of
choice...I just did not want to anticipate on Postgres being friendly
to a new DDL command when writing this the first time.

fdr

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: [HACKERS] [PATCH 1/4] Add "COPY ... TO FUNCTION ..." support
Next: Backup history file should be replicated in Streaming Replication?