Performance Enhancement/Fix for Array Utility Functions [PgSql]

Prev: [HACKERS] Review: Patch for phypot - Pygmy Hippotause
Next: Fwd: [HACKERS] gSoC - ADD MERGE COMMAND - code patch submission

From: Tom Lane on 16 Jul 2010 16:43

Daniel Farina <drfarina(a)acm.org> writes:
> Generally I think the delimited untoasting of metadata from arrays
> separately from the payload is Not A Bad Idea.

I looked at this patch a bit. I agree that it could be a big win for
large external arrays, but ...

1. As-is, it's a significant *pessimization* for small arrays, because
the heap_tuple_untoast_attr_slice code does a palloc/copy even when one
is not needed because the data is already not toasted. I think there
needs to be a code path that avoids that.

2. Arrays that are large enough to be pushed out to toast storage are
almost certainly going to get compressed first. The potential win in
this case is very limited because heap_tuple_untoast_attr_slice will
fetch and decompress the whole thing. Admittedly this is a limitation
of the existing code and not a fault of the patch proper, but still, if
you want to make something that's generically useful, you need to do
something about that. Perhaps pglz_decompress() could be extended with
an argument to say "decompress no more than this much" --- although that
would mean adding another test to its inner loop, so we'd need to check
for performance degradation. I'm also unsure how to predict how much
compressed data needs to be read in to get at least N bytes of
decompressed data.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 27 Jul 2010 23:43

On Fri, Jul 16, 2010 at 4:43 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Daniel Farina <drfarina(a)acm.org> writes:
>> Generally I think the delimited untoasting of metadata from arrays
>> separately from the payload is Not A Bad Idea.
>
> I looked at this patch a bit. �I agree that it could be a big win for
> large external arrays, but ...
>
> 1. As-is, it's a significant *pessimization* for small arrays, because
> the heap_tuple_untoast_attr_slice code does a palloc/copy even when one
> is not needed because the data is already not toasted. �I think there
> needs to be a code path that avoids that.

This seems like it shouldn't be too hard to fix, and I think it should be fixed.

> 2. Arrays that are large enough to be pushed out to toast storage are
> almost certainly going to get compressed first. �The potential win in
> this case is very limited because heap_tuple_untoast_attr_slice will
> fetch and decompress the whole thing. �Admittedly this is a limitation
> of the existing code and not a fault of the patch proper, but still, if
> you want to make something that's generically useful, you need to do
> something about that. �Perhaps pglz_decompress() could be extended with
> an argument to say "decompress no more than this much" --- although that
> would mean adding another test to its inner loop, so we'd need to check
> for performance degradation. �I'm also unsure how to predict how much
> compressed data needs to be read in to get at least N bytes of
> decompressed data.

But this part seems to me to be something that can, and probably
should, be left for a separate patch. I don't see that we're losing
anything this way; we're just not winning as much as we otherwise
might. To really fix this, I suspect we'd need a version of
pglz_decompress that can operate in "streaming mode"; you tell it how
many bytes you want, and it returns a code indicating that either (a)
it decompressed that many bytes or (b) it decompressed less than that
many bytes and you can call it again with another chunk of data if you
want to keep going. That'd probably be a good thing to have, but I
don't think it needs to be a prerequisite for this patch.

I'm going to mark this patch "Waiting on Author".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Mike Lewis on 28 Jul 2010 01:20

>
>
> > 1. As-is, it's a significant *pessimization* for small arrays, because
> > the heap_tuple_untoast_attr_slice code does a palloc/copy even when one
> > is not needed because the data is already not toasted. I think there
> > needs to be a code path that avoids that.
>
> This seems like it shouldn't be too hard to fix, and I think it should be
> fixed.
>

Do you have any suggestions where to start? I do agree that this should be
fixed as well. I don't have too much time to dedicate to this project. I
can try to put in some time this weekend though if it isn't looking too bad.

> > 2. Arrays that are large enough to be pushed out to toast storage are
> > almost certainly going to get compressed first. The potential win in
> > this case is very limited because heap_tuple_untoast_attr_slice will
> > fetch and decompress the whole thing. Admittedly this is a limitation
> > of the existing code and not a fault of the patch proper, but still, if
> > you want to make something that's generically useful, you need to do
> > something about that. Perhaps pglz_decompress() could be extended with
> > an argument to say "decompress no more than this much" --- although that
> > would mean adding another test to its inner loop, so we'd need to check
> > for performance degradation. I'm also unsure how to predict how much
> > compressed data needs to be read in to get at least N bytes of
> > decompressed data.
>
> But this part seems to me to be something that can, and probably
> should, be left for a separate patch. I don't see that we're losing
> anything this way; we're just not winning as much as we otherwise
> might. To really fix this, I suspect we'd need a version of
> pglz_decompress that can operate in "streaming mode"; you tell it how
> many bytes you want, and it returns a code indicating that either (a)
> it decompressed that many bytes or (b) it decompressed less than that
> many bytes and you can call it again with another chunk of data if you
> want to keep going. That'd probably be a good thing to have, but I
> don't think it needs to be a prerequisite for this patch.
>

Hopefully this is the case. I can try tackling the first part, however.

Thanks,
Mike
--
Michael Lewis
lolrus.org
mikelikespie(a)gmail.com

From: Robert Haas on 29 Jul 2010 15:29

On Wed, Jul 28, 2010 at 1:20 AM, Mike Lewis <mikelikespie(a)gmail.com> wrote:
>>
>> > 1. As-is, it's a significant *pessimization* for small arrays, because
>> > the heap_tuple_untoast_attr_slice code does a palloc/copy even when one
>> > is not needed because the data is already not toasted. �I think there
>> > needs to be a code path that avoids that.
>>
>> This seems like it shouldn't be too hard to fix, and I think it should be
>> fixed.
>
> Do you have any suggestions where to start? �I do agree that this should be
> fixed as well. ��I don't have too much time to dedicate to this project. �I
> can try to put in some time this weekend though if it isn't looking too bad.

Perhaps you could check VARATT_IS_EXTENDED. If that's true, then
slice it, but if it's false, then just use the original datum. You
might want to wrap that up in a function rather than cramming it all
in the macro definition, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on 5 Aug 2010 13:52

Robert Haas <robertmhaas(a)gmail.com> wrote:
> On Wed, Jul 28, 2010 at 1:20 AM, Mike Lewis
> <mikelikespie(a)gmail.com> wrote:
>>>
>>> > 1. As-is, it's a significant *pessimization* for small arrays,
>>> > because the heap_tuple_untoast_attr_slice code does a
>>> > palloc/copy even when one is not needed because the data is
>>> > already not toasted. I think there needs to be a code path
>>> > that avoids that.
>>>
>>> This seems like it shouldn't be too hard to fix, and I think it
>>> should be fixed.
>>
>> Do you have any suggestions where to start? I do agree that this
>> should be fixed as well. I don't have too much time to dedicate
>> to this project. I can try to put in some time this weekend
>> though if it isn't looking too bad.
>
> Perhaps you could check VARATT_IS_EXTENDED. If that's true, then
> slice it, but if it's false, then just use the original datum.
> You might want to wrap that up in a function rather than cramming
> it all in the macro definition, though.

As Mike hasn't been able to find the time to get to this yet, I'm
marking this as "Returned with Feedback". Hopefully the issues can
be addressed in the next five weeks and we can pick it up again in
the next CommitFest.

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

| Next | Last
Pages: 1 2
Prev: [HACKERS] Review: Patch for phypot - Pygmy Hippotause
Next: Fwd: [HACKERS] gSoC - ADD MERGE COMMAND - code patch submission