On Scalability [PgSql]

Prev: Obtain a University Degree based on your professional experience for a more satisfactory life. admonishing akenes aglycone
Next: [HACKERS] review patch: Distinguish between unique indexes and unique constraints

From: "Joshua D. Drake" on 29 Jul 2010 13:12

On Thu, 2010-07-29 at 19:08 +0200, Vincenzo Romano wrote:
> Hi all.
> I'm wondering about PGSQL scalability.
> In particular I have two main topics in my mind:
>
> 1. What'd be the behavior of the query planner in the case I have
> a single huge table with hundreds or thousands of partial indexes
> (just differing by the WHERE clause).
> This is an idea of mine to make index-partitioning instead of
> table-partitioning.

Well the planner is not going to care about the partial indexes that
don't match the where clause but what you are suggesting is going to
make writes and maintenance extremely expensive. It will also increase
planning time as the optimizer at a minimum has to discard the use of
those indexes.

>
> 2. What'd be the behavior of the query planner in the case I have
> hundreds or thousands of child tables, possibly in a multilevel hierarchy
> (let's say, partitioning by year, month and company).

Again, test it. Generally speaking the number of child tables directly
correlates to planning time. Most experience that 60-100 tables is
really the highest you can go.

It all depends on actual implementation and business requirements
however.

Sincerely,

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Joshua D. Drake" on 29 Jul 2010 13:39

On Thu, 2010-07-29 at 19:34 +0200, Vincenzo Romano wrote:

> I expect that a more complex schema will imply higher workloads
> on the query planner. What I don't know is how the increase in the
> workload will happen: linearly, sublinearly, polinomially or what?
>
> Significant testing would require a prototype implementation with
> an almost complete feed of data from the current solution.
> But I'm at the feasibility study stage and have not enough resources
> for that.
>
> Thanks anyway for the insights, Joshua.
> Does the 60-100 tables limit applies to a single level
> of inheritance? Or is it more general?

I do not currently have experience (except that it is possible) with
multi-level inheritance and postgresql.

>

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on 30 Jul 2010 06:49

On Fri, Jul 30, 2010 at 11:24 AM, Vincenzo Romano
<vincenzo.romano(a)notorand.it> wrote:
> At a first glance it seems that for inheritance some bottleneck is
> hindering a full exploit for table partitioning.

There have been lengthy discussions of how to implement partitioning
to fix these precise problems, yes.

> Is there anyone who knows whether those algorithms are linear or not?

They're linear in both cases. But they happen at plan time rather than
query execution time. So if your application prepares all its queries
and then uses them many times it would not slow down query execution
but would slow down the query planning time. In some applications this
is much better but in others unpredictable run-times is as bad as long
run-times.

Also in the case of having many partial indexes it would slow down
inserts and updates as well, though to a lesser degree, and that would
happen at execution time.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Vincenzo Romano on 30 Jul 2010 07:40

2010/7/30 Greg Stark <gsstark(a)mit.edu>:
> On Fri, Jul 30, 2010 at 11:24 AM, Vincenzo Romano
> <vincenzo.romano(a)notorand.it> wrote:
>> At a first glance it seems that for inheritance some bottleneck is
>> hindering a full exploit for table partitioning.
>
> There have been lengthy discussions of how to implement partitioning
> to fix these precise problems, yes.

Any reference?

>> Is there anyone who knows whether those algorithms are linear or not?
>
> They're linear in both cases. But they happen at plan time rather than
> query execution time. So if your application prepares all its queries
> and then uses them many times it would not slow down query execution
> but would slow down the query planning time. In some applications this
> is much better but in others unpredictable run-times is as bad as long
> run-times.

Hmmm ... maybe I'm missing the inner meaning of your remarks, Greg.
By using PREPARE I run the query planned sooner and I should use
the plan with the later execution.
You can bet that some of the PREPAREd query variables will
pertain to either the child table's CHECK contraints (for table partitions)
or to the partial index's WHERE condition (for index partitioning).

It's exactly this point (execution time) where the "linearity" will
kill the query
over a largely partitioned table.

Is this what you meant? :-)

> Also in the case of having many partial indexes it would slow down
> inserts and updates as well, though to a lesser degree, and that would
> happen at execution time.

This makes fully sense to me.

--
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
NON QVIETIS MARIBVS NAVTA PERITVS

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Vincenzo Romano on 30 Jul 2010 15:50

2010/7/30 Josh Berkus <josh(a)agliodbs.com>:
>
>> Is there anyone who knows whether those algorithms are linear or not?
>
> Read the code? It's really very accessible, and there's lots and lots
> of comments. While the person who wrote the code is around, isn't it
> better to see the real implementation?

If the programmer(s) who wrote that part is around, a simple hint would suffice.
Even an hint to where look into the code would be very appreciated: the query
planner is not as simple as the "ls" command (which is not that simple any
more, though).

It looks like I need to go the hard way ...
Starting from postgresql-8.4.4/src/backend/optimizer

--
Vincenzo Romano at NotOrAnd Information Technologies
Software Hardware Networking Training Support Security
--
cel +393398083886 fix +390823454163 fax +3902700506964
gtalk. vincenzo.romano(a)notorand.it skype. notorand.it
--
NON QVIETIS MARIBVS NAVTA PERITVS

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3
Prev: Obtain a University Degree based on your professional experience for a more satisfactory life. admonishing akenes aglycone
Next: [HACKERS] review patch: Distinguish between unique indexes and unique constraints