[PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION [PgSql]

Prev: [HACKERS] [PATCH 1/4] Add "COPY ... TO FUNCTION ..." support
Next: Backup history file should be replicated in Streaming Replication?

From: Pavel Stehule on 25 Nov 2009 01:09

2009/11/25 Daniel Farina <drfarina(a)gmail.com>:
> On Tue, Nov 24, 2009 at 9:35 PM, Pavel Stehule <pavel.stehule(a)gmail.com> wrote:
>> 2009/11/25 Daniel Farina <drfarina(a)gmail.com>:
>>> On Tue, Nov 24, 2009 at 8:45 PM, Pavel Stehule <pavel.stehule(a)gmail.com> wrote:
>>>> It depends on design. I don't thing so internal is necessary. It is
>>>> just wrong design.
>>>
>>> Depends on how lean you want to be when doing large COPY...right now
>>> the cost is restricted to having to call a function pointer and a few
>>> branches. Â If you want to take SQL values, then the semantics of
>>> function calling over a large number of rows is probably notably more
>>> expensive, although I make no argument against the fact that the
>>> non-INTERNAL version would give a lot more people more utility.
>>
>> I believe so using an "internal" minimalize necessary changes in COPY
>> implementation. Using a funcapi needs more work inside COPY - Â you
>> have to take some functionality from COPY to stream functions.
>> Probably the most slow operations is parsing - calling a input
>> functions. This is called once every where. Second slow operation is
>> reading from network - it is same. So I don't see too much reasons,
>> why non internal implementation have to be significant slower than
>> your actual implementation. I am sure, so it needs more work.
>

"internal" is important (for performance) for aggregation function -
where is protection under repeated alloc/free memory - it work well
and it is +/- ugly hack. We cannot do some things well - simply there
are missing some support. Nobody calculated with very large string,
array concatenation in design time - It is reason, why I am against to
using it.

> You are probably right. Â We could try coercing to bytea and back out
> to bytes, although it seems like a superfluous cost to force
> *everyone* to pay just to get the same bytes to a network buffer.
>

I am not sure if this is good analogy. Only "filestream" or "network"
stream is stream of bytes. From any sophisticated stream I am taking
tuples - database stream, SOAP stream. I agree, so dblink could to
returns binary compatible records - but it is one special and
exclusive case. Sure, important and have to calculated. Still I am
thinking so dblink to postgres is other hack and should be replaced).

> fdr
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Pavel Stehule on 25 Nov 2009 01:31

2009/11/25 Jeff Davis <pgsql(a)j-davis.com>:
> On Tue, 2009-11-24 at 21:42 -0800, Daniel Farina wrote:
>> You are probably right. Â We could try coercing to bytea and back out
>> to bytes, although it seems like a superfluous cost to force
>> *everyone* to pay just to get the same bytes to a network buffer.
>
> Well, I suppose only performance will tell. Copying a buffer is sure to
> be faster than invoking all of the type input/output functions, or even
> send/recv, so perhaps it's not a huge penalty.
>
> My disagreement with the row-by-row approach is more semantics than
> performance. COPY translates records to bytes and vice-versa, and your
> original patch maintains those semantics.

uff, really

COPY CSV ?

Pavel

>
> Regards,
> Â Â Â Â Jeff Davis
>
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Daniel Farina on 25 Nov 2009 01:35

On Tue, Nov 24, 2009 at 10:23 PM, Jeff Davis <pgsql(a)j-davis.com> wrote:
> On Wed, 2009-11-25 at 06:35 +0100, Pavel Stehule wrote:
>> I believe so using an "internal" minimalize necessary changes in COPY
>> implementation. Using a funcapi needs more work inside COPY - you
>> have to take some functionality from COPY to stream functions.
>> Probably the most slow operations is parsing - calling a input
>> functions. This is called once every where. Second slow operation is
>> reading from network - it is same. So I don't see too much reasons,
>> why non internal implementation have to be significant slower than
>> your actual implementation. I am sure, so it needs more work.
>
> I apologize, but I don't understand what you're saying. Can you please
> restate with some examples?
>
> It seems like you're advocating that we move records from a table into a
> function using COPY. But that's not what COPY normally does: COPY
> normally translates records to bytes or bytes to records.

Perhaps what we want is pluggable transformation functions that can
format the row any way that is desired, with the current behavior
being some default. Putting COPY TO FUNCTION as submitted aside, what
about something like this:

COPY foo TO '/tmp/foo' USING postgres_builtin_formatter(csv = true);

This is something completely different than what was submitted, so in
some aspect:

COPY foo TO FUNCTION dblink_send_row USING
postgres_builtin_formatter(binary = true);

Would compose the two features...

(Again, very, very far from a real syntax suggestion)

fdr

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Pavel Stehule on 25 Nov 2009 01:36

2009/11/25 Jeff Davis <pgsql(a)j-davis.com>:
> On Wed, 2009-11-25 at 06:35 +0100, Pavel Stehule wrote:
>> I believe so using an "internal" minimalize necessary changes in COPY
>> implementation. Using a funcapi needs more work inside COPY - Â you
>> have to take some functionality from COPY to stream functions.
>> Probably the most slow operations is parsing - calling a input
>> functions. This is called once every where. Second slow operation is
>> reading from network - it is same. So I don't see too much reasons,
>> why non internal implementation have to be significant slower than
>> your actual implementation. I am sure, so it needs more work.
>
> I apologize, but I don't understand what you're saying. Can you please
> restate with some examples?
>
> It seems like you're advocating that we move records from a table into a
> function using COPY. But that's not what COPY normally does: COPY
> normally translates records to bytes or bytes to records.
>
> Moving records from a table to a function can be done with:
> Â SELECT myfunc(mytable) FROM mytable;
> already. The only problem is if you want initialization/destruction. But
> I'm not convinced that COPY is the best tool to provide that.
>
> Moving records from a function to a table can be done with:
> Â INSERT INTO mytable SELECT * FROM myfunc();
> And that already works fine.

It works, but COPY FROM myfunc() should be significantly faster. You
can skip tuple store.

Pavel

>
> So what use case are you concerned about?
>
> Regards,
> Â Â Â Â Jeff Davis
>
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Pavel Stehule on 25 Nov 2009 01:39

2009/11/25 Daniel Farina <drfarina(a)gmail.com>:
> On Tue, Nov 24, 2009 at 10:23 PM, Jeff Davis <pgsql(a)j-davis.com> wrote:
>> On Wed, 2009-11-25 at 06:35 +0100, Pavel Stehule wrote:
>>> I believe so using an "internal" minimalize necessary changes in COPY
>>> implementation. Using a funcapi needs more work inside COPY - Â you
>>> have to take some functionality from COPY to stream functions.
>>> Probably the most slow operations is parsing - calling a input
>>> functions. This is called once every where. Second slow operation is
>>> reading from network - it is same. So I don't see too much reasons,
>>> why non internal implementation have to be significant slower than
>>> your actual implementation. I am sure, so it needs more work.
>>
>> I apologize, but I don't understand what you're saying. Can you please
>> restate with some examples?
>>
>> It seems like you're advocating that we move records from a table into a
>> function using COPY. But that's not what COPY normally does: COPY
>> normally translates records to bytes or bytes to records.
>
> Perhaps what we want is pluggable transformation functions that can
> format the row any way that is desired, with the current behavior
> being some default. Â Putting COPY TO FUNCTION as submitted aside, what
> about something like this:
>
> COPY foo TO '/tmp/foo' USING postgres_builtin_formatter(csv = true);
>
> This is something completely different than what was submitted, so in
> some aspect:
>
> COPY foo TO FUNCTION dblink_send_row USING
> postgres_builtin_formatter(binary = true);
>
> Would compose the two features...
>

yes - it is two features - and should be solved independently

Pavel

> (Again, very, very far from a real syntax suggestion)
>
> fdr
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: [HACKERS] [PATCH 1/4] Add "COPY ... TO FUNCTION ..." support
Next: Backup history file should be replicated in Streaming Replication?