bitmap indexes - performance [PgSql]

Prev: [HACKERS] Planner deficiencies with views that use windowing functions
Next: [HACKERS] superfluous copydir() prototype in pg_upgrade.h

From: Bruce Momjian on 1 Jul 2010 10:16

Leonardo F wrote:
> Using as a starting point the old bitmap patch in:
>
> http://archives.postgresql.org/message-id/20081101000154.GO27872(a)fune
>
>
> I re-applied and re-worked the patch to see what kind of improvements over
> btrees bitmaps actually provided.
>
> Using a 20M rows table of 10/100/1000 random values, I've found that:
>
> 1) bulk index creation time is roughly 6 times better
> 2) index size is 6-15 times smaller (depending on column cardinality)
> 3) there's almost no difference in query times (but I have to make more
> tests)
> 4) I can't say anything about the insertion performance, but I guess
> bitmap will perform way worse than btree
>
> Are these improvements (index creation time, index size) worth enough
> to keep on working on this?
>
> I mean: given that bitmaps don't give any benefits in query times, but
> only benefits related to disk size and bulk index creation times, and
> will have horrible performance for insertions/deletions: would this job be
> worthed?
>
> In case it is: I will try to clean up the patch and post it...
>
>
> As a side note: I guess that most of the bitmap indexes performance
> improvements in the SELECT area are already implemented in postgres
> in the bitmapand/or and bitmap scan stuff? I couldn't find any docs that
> say that bitmap indexes are faster for selects, unless of course they
> are ANDed/ORed together (which is something postgres already does
> for regular btree indexes)

Great report, thanks. The other big problem with on-disk bitmap indexes
is removing expired values via vacuum.

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 1 Jul 2010 11:04

On Thu, Jul 1, 2010 at 9:23 AM, Leonardo F <m_lists(a)yahoo.it> wrote:
> Using as a starting point the old bitmap patch in:
>
> http://archives.postgresql.org/message-id/20081101000154.GO27872(a)fune
>
>
> I re-applied and re-worked the patch to see what kind of improvements over
> btrees bitmaps actually provided.
>
> Using a 20M rows table of 10/100/1000 random values, I've found that:
>
> 1) bulk index creation time is roughly 6 times better
> 2) index size is 6-15 times smaller (depending on column cardinality)
> 3) there's almost no difference in query times (but I have to make more
> tests)
> 4) I can't say anything about the insertion performance, but I guess
> bitmap will perform way worse than btree
>
> Are these improvements (index creation time, index size) worth enough
> to keep on working on this?
>
> I mean: given that bitmaps don't give any benefits in query times, but
> only benefits related to disk size and bulk index creation times, and
> will have horrible performance for insertions/deletions: would this job be
> worthed?
>
> In case it is: I will try to clean up the patch and post it...
>
>
> As a side note: I guess that most of the bitmap indexes performance
> improvements in the SELECT area are already implemented in postgres
> in the bitmapand/or and bitmap scan stuff? I couldn't find any docs that
> say that bitmap indexes are faster for selects, unless of course they
> are ANDed/ORed together (which is something postgres already does
> for regular btree indexes)

Hmm... no performance improvement? That's not encouraging.

The index being smaller ought to by itself provide some performance
improvement if, say, the smaller index can fit in cache and the larger
one can't. With a 6-15x size difference, that's presumably not an
implausible scenario. But I guess the real point is to be able to AND
and OR bitmap indices on multiple columns. Not sure if this
implementation supports that or not (I haven't read the patch) and how
the performance compares to doing Bitmap Heap Scan -> BitmapAnd ->
Bitmap Index Scan with btree indices.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 1 Jul 2010 11:21

Robert Haas <robertmhaas(a)gmail.com> writes:
> Hmm... no performance improvement? That's not encouraging.

> The index being smaller ought to by itself provide some performance
> improvement if, say, the smaller index can fit in cache and the larger
> one can't. With a 6-15x size difference, that's presumably not an
> implausible scenario. But I guess the real point is to be able to AND
> and OR bitmap indices on multiple columns. Not sure if this
> implementation supports that or not (I haven't read the patch) and how
> the performance compares to doing Bitmap Heap Scan -> BitmapAnd ->
> Bitmap Index Scan with btree indices.

In principle a bitmap index scan should be significantly faster if the
index can return the bitmap more or less "natively" rather than having
to construct it. My recollection though is that a significant amount of
work is needed to make that happen, and that there is no existing patch
that tackled the problem. So I'm not sure that this report should be
taken as indicating that there's no chance of a SELECT performance
improvement. What it does say is that we have to do that work if we
want to make bitmap indexes useful.

In particular, I recall some discussions about developing a "streaming
API" whereby an index AM could return a bitmap page-by-page or so,
rather than having to construct the whole thing in-memory before
anything could happen. This would be a huge win for AND/OR cases,
and even for a simple indexscan it would eliminate the existing startup
cost penalty for a bitmap scan. Streaming like this would also
eliminate the problem of having to lossify large bitmaps in order to
not overrun memory.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 1 Jul 2010 11:40

On Thu, Jul 1, 2010 at 11:21 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> In particular, I recall some discussions about developing a "streaming
> API" whereby an index AM could return a bitmap page-by-page or so,
> rather than having to construct the whole thing in-memory before
> anything could happen. �This would be a huge win for AND/OR cases,
> and even for a simple indexscan it would eliminate the existing startup
> cost penalty for a bitmap scan. �Streaming like this would also
> eliminate the problem of having to lossify large bitmaps in order to
> not overrun memory.

Now that would be cool. The existing startup penalty for a bitmap
scan is the pits.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Bruce Momjian on 1 Jul 2010 21:31

Leonardo F wrote:
> I'm trying to find more docs that explain the "improvements" of
> bitmap indexes in other products... but most of what I've found
> talks about bitmapAND/OR.... which is something that is very
> cool, but that postgres already does even with btree indexes...
> or index creation time/size, which are, for the moment, the only
> things that I'm pretty confident the patch would actually provide.

I think a real limitation of on-disk bitmap indexes is that they are
only feable for low cardinality columns, while btree handles all column
types.

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

| Next | Last
Pages: 1 2
Prev: [HACKERS] Planner deficiencies with views that use windowing functions
Next: [HACKERS] superfluous copydir() prototype in pg_upgrade.h