[PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION [PgSql]

Prev: [HACKERS] [PATCH 1/4] Add "COPY ... TO FUNCTION ..." support
Next: Backup history file should be replicated in Streaming Replication?

From: Pavel Stehule on 25 Nov 2009 02:03

2009/11/25 Pavel Stehule <pavel.stehule(a)gmail.com>:
> 2009/11/25 Daniel Farina <drfarina(a)gmail.com>:
>> On Tue, Nov 24, 2009 at 10:23 PM, Jeff Davis <pgsql(a)j-davis.com> wrote:
>>> On Wed, 2009-11-25 at 06:35 +0100, Pavel Stehule wrote:
>>>> I believe so using an "internal" minimalize necessary changes in COPY
>>>> implementation. Using a funcapi needs more work inside COPY - Â you
>>>> have to take some functionality from COPY to stream functions.
>>>> Probably the most slow operations is parsing - calling a input
>>>> functions. This is called once every where. Second slow operation is
>>>> reading from network - it is same. So I don't see too much reasons,
>>>> why non internal implementation have to be significant slower than
>>>> your actual implementation. I am sure, so it needs more work.
>>>
>>> I apologize, but I don't understand what you're saying. Can you please
>>> restate with some examples?
>>>
>>> It seems like you're advocating that we move records from a table into a
>>> function using COPY. But that's not what COPY normally does: COPY
>>> normally translates records to bytes or bytes to records.
>>
>> Perhaps what we want is pluggable transformation functions that can
>> format the row any way that is desired, with the current behavior
>> being some default. Â Putting COPY TO FUNCTION as submitted aside, what
>> about something like this:
>>
>> COPY foo TO '/tmp/foo' USING postgres_builtin_formatter(csv = true);
>>
>> This is something completely different than what was submitted, so in
>> some aspect:
>>
>> COPY foo TO FUNCTION dblink_send_row USING
>> postgres_builtin_formatter(binary = true);
>>
>> Would compose the two features...
>>
>
> yes - it is two features - and should be solved independently

it and it is not (some thinking) - smarter streams should to
accept/returns tuples. Formating function has sense for text output -
there are input/output formating (text based/bytea based) functions.

I see one possible problem - when formater functions will be row based
- then you cannot generate some prolog and epilog part of file -
(xml).

Pavel

>
> Pavel
>
>> (Again, very, very far from a real syntax suggestion)
>>
>> fdr
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Pavel Stehule on 25 Nov 2009 03:23

2009/11/25 Jeff Davis <pgsql(a)j-davis.com>:
> On Wed, 2009-11-25 at 07:36 +0100, Pavel Stehule wrote:
>> > Moving records from a function to a table can be done with:
>> > Â INSERT INTO mytable SELECT * FROM myfunc();
>> > And that already works fine.
>>
>> It works, but COPY FROM myfunc() should be significantly faster. You
>> can skip tuple store.
>
> If SRFs use a tuplestore in that situation, it sounds like that should
> be fixed. Why do we need to provide alternate syntax involving COPY?

It isn't problem of SRF function design. It allow both mode - row and
tuplestor. This is problem of INSERT statement, resp. INSERT INTO
SELECT implementation.

Regards
Pavel

>
> Regards,
> Â Â Â Â Jeff Davis
>
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Daniel Farina on 25 Nov 2009 12:43

On Tue, Nov 24, 2009 at 10:39 PM, Pavel Stehule <pavel.stehule(a)gmail.com> wrote:
> yes - it is two features - and should be solved independently

There are some common problems though. I was thinking about this with
some mind towards my existing mental model of thinking of specifying
some parameters up-front and getting a stream of records or bytes
(depending on what feature you are referring to) as a form of
currying, combined with the not-complete revulsion as to using
aggregates as a base for such functionality....

What if we extended aggregates to support a function as an initial
condition which could be called with parameters when initializing the
aggregate? If you squint at it just right, the current form is that
of a value/constant -- effectively the zero-parameter function.

Here's a gist of what I want to realize:

SELECT (avg())(column_name)
FROM ...

This is a vanilla average. That's not very interesting since avg only
has one default initial value. However, at Truviso we have
encountered a real case where we wanted SUM to be initialized to "0"
instead of "NULL". I had to create a new aggregate with that as an
initial condition, which is fine because we only needed one extra
standard behavior. But perhaps instead it could have been written
this way:

SELECT (sum(0))(column_name)
FROM ...

That way people could get 'zero' rather than NULL when their query
yields no rows. You could also imagine some code out there that may
have a running-sum of sorts, and may want to seed SUM to some
non-zero, non-NULL initial value as set by the application.

At that point we may be able to abuse the aggregate infrastructure to
doing what we want in the case of these COPY extensions more easily...

fdr

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 25 Nov 2009 13:34

Jeff Davis <pgsql(a)j-davis.com> writes:
> On Wed, 2009-11-25 at 09:23 +0100, Pavel Stehule wrote:
>>> If SRFs use a tuplestore in that situation, it sounds like that should
>>> be fixed. Why do we need to provide alternate syntax involving COPY?
>>
>> It isn't problem of SRF function design. It allow both mode - row and
>> tuplestor.

> select * from generate_series(1,1000000000) limit 1;

> That statement takes a long time, which indicates to me that it's
> materializing the result of the SRF.

Yeah. This is certainly fixable if someone wants to do the legwork of
refactoring ExecMakeTableFunctionResult(). It was done that way
originally just to keep down the complexity of introducing the
function-as-table-source feature at all.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Pavel Stehule on 25 Nov 2009 23:01

2009/11/25 Jeff Davis <pgsql(a)j-davis.com>:
> On Wed, 2009-11-25 at 11:32 +0100, Pavel Stehule wrote:
>> 1.
>> postgres=# select count(*) from generate_series(1,1000000);
>> Â count
>> âââââââââ
>> Â 1000000
>> (1 row)
>>
>> Time: 930,720 ms
>>
>> 2.
>> postgres=# select count(*) from (select generate_series(1,1000000)) x;
>> Â count
>> âââââââââ
>> Â 1000000
>> (1 row)
>>
>> Time: 276,511 ms
>>
>> 2. is significantly faster then 1 (there are not SRF materialisation)
>
> I think case #1 can be fixed.
>
>> generate_function is fast and simple - but still COPY is about 30% faster
>
> My quick tests are not consistent enough, so I will have to try with
> more data. The times look similar to me so far.
>
> If there is a difference, I wonder what it is?
>
>> I thing, so materialisation is every time, when you use any SQL
>> statement without cursor.
>
> I don't think that is true. Here's an expanded version of my previous
> example:
>
> create table zero(i int);
> create table tmp(j int);
> insert into zero select 0 from generate_series(1,1000000); -- all 0
> insert into tmp select 1/i from zero; -- error immediately, doesn't wait
>
> The error would take longer if it materialized the table "zero". But
> instead, it passes the first tuple to the function for "/" before the
> other tuples are read, and gets an error immediately. So no
> materialization.

this show nothing.

It working like:

1. EXECUTE SELECT 0 FROM generate_series(1,...);
2. STORE RESULT TO TABLE zero;
3. EXECUTE SELECT 1/i FROM zero;
4. STORE RESULT TO TABLE tmp;

Problem is in seq execution. Result is stored to destination after
execution - so any materialisation is necessary,

>
> I worry that we're getting further away from the original problem. Let's
> allow functions to get the bytes of data from a COPY, like the original
> proposal. I am not sure COPY is the best mechanism to move records
> around when INSERT ... SELECT already does that.
>

In one single case hack I prefer using any hook and feature stored
contrib. I don't see a general using for this feature.

Regards
Pavel Stehule

> Regards,
> Â Â Â Â Jeff Davis
>
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: [HACKERS] [PATCH 1/4] Add "COPY ... TO FUNCTION ..." support
Next: Backup history file should be replicated in Streaming Replication?