Red-black tree for GIN [PgSql]

Prev: Typed tables
Next: [HACKERS] pg_dump sort order for functions

From: Robert Haas on 10 Jan 2010 21:42

On Thu, Dec 31, 2009 at 4:19 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
> My other question is as related to performance. Can you provide a
> test case that shows the performance improvement with this patch?

So, we still don't have a test case for this patch. During the
November CommitFest, Greg Smith griped a bit about the lack of a
reproducible performance benchmark for the XLogInsert patch:

http://archives.postgresql.org/pgsql-hackers/2009-12/msg00816.php

....and I would say the same logic applies to this patch, maybe even
moreso. Tom has already applied a partial workaround for this
problem, and I'm feeling like it won't be trivial to figure out what
to measure to see the remaining issue and measure how much this new
implementation helps.

The coding pattern that this patch uses also merits some discussion.
Basically, rbtree.c is a generic implementation of red-black trees -
from a textbook - which ginbulk.c then uses for GIN. One possible
advantage of this implementation is that it might make it possible for
us to use the rbtree.c logic in other places, if we have other data
structures that need similar treatment. But I'm not sure if that's
the way we want to go. The other alternative is to drop the
generalized implementation and incorporate the logic directly into
ginbulk.c. I really don't know which is better, but I'd like to hear
some other opinions...

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on 10 Jan 2010 21:54

On Mon, Jan 11, 2010 at 2:42 AM, Robert Haas <robertmhaas(a)gmail.com> wrote:
> The coding pattern that this patch uses also merits some discussion.
> Basically, rbtree.c is a generic implementation of red-black trees -
> from a textbook - which ginbulk.c then uses for GIN. One possible
> advantage of this implementation is that it might make it possible for
> us to use the rbtree.c logic in other places, if we have other data
> structures that need similar treatment. But I'm not sure if that's
> the way we want to go. The other alternative is to drop the
> generalized implementation and incorporate the logic directly into
> ginbulk.c. I really don't know which is better, but I'd like to hear
> some other opinions...
>

So, code reuse is not the only advantage of abstraction. It's also
just plain easier to understand and test code written with clear
abstract interfaces. The way you describe it someone with no knowledge
could look at rbtree.c and see if it's done correctly and maybe
improve it. And someone reading ginbulk only has to understand the
operations red-black trees support and no understand how they're
implemented to follow the code.

Is there any advantage of integrating the code with ginbulk.c? Does it
let us get away with finer grained locks or do any tricks doing
gin-specific changes while we're in the middle of rebalancing or
anything like that?

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

|
Pages: 1
Prev: Typed tables
Next: [HACKERS] pg_dump sort order for functions