Prev: Obtain a University Degree based on your professional experience for a more satisfactory life. admonishing akenes aglycone
Next: [HACKERS] review patch: Distinguish between unique indexes and unique constraints
From: Robert Haas on 30 Jul 2010 16:39 On Fri, Jul 30, 2010 at 3:50 PM, Vincenzo Romano <vincenzo.romano(a)notorand.it> wrote: > 2010/7/30 Josh Berkus <josh(a)agliodbs.com>: >> >>> Is there anyone who knows whether those algorithms are linear or not? >> >> Read the code? �It's really very accessible, and there's lots and lots >> of comments. �While the person who wrote the code is around, isn't it >> better to see the real implementation? > > If the programmer(s) who wrote that part is around, a simple hint would suffice. > Even an hint to where look into the code would be very appreciated: the query > planner is not as simple as the "ls" command (which is not that simple any > more, though). > > It looks like I need to go the hard way ... > Starting from postgresql-8.4.4/src/backend/optimizer I think you're approaching this in the wrong way. You've repeatedly said you don't want to do all the work of setting up a test, but trying to search the code for algorithms that might not be linear is not going to be easier. I've been reading this thread and I'm fairly familiar with this code, and I even understand the algorithms pretty well, and I don't know whether they're going to be linear for what you want to do or not. Certainly, the overall task of join planning is exponential-time in the number of *tables*, but if you're just doing SELECT statements on a single table, will that be linear? Tough to say. Certainly, there are a lot of linked lists in there, so if we have any place where we have two nested loops over the list of indices, it won't be linear. I can't think of a place where we do that, but that doesn't mean there isn't one. And even if they are linear, or n log n or something, the coefficients could still be lousy. Theoretical computer science is one of my favorite subjects, but, it doesn't always tell you what you want to know about the real world. It doesn't seem like it should be very hard to figure this out empirically. Just create a big database full of random data. Maybe you could populate one of the columns with something like (random() * 1000)::int. Then you could create partial indices ON (some_other_column) WHERE that_column = <blat> for <blat> in 0..999. Then you could run some test queries and see how you make out. Or, I mean, you can read the source code. That's fine, too. It's just... I've already read the source code quite a few times, and I still don't know the answer. Admittedly, I wasn't trying to answer this specific question, but still - I don't think it's an easy question to answer that way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Smith on 30 Jul 2010 16:38 Vincenzo Romano wrote: > By using PREPARE I run the query planned sooner and I should use > the plan with the later execution. > You can bet that some of the PREPAREd query variables will > pertain to either the child table's CHECK contraints (for table partitions) > or to the partial index's WHERE condition (for index partitioning). > Prepared statements are not necessarily a cure for long query planning time, because the sort of planning decisions made with partitioned child tables and index selection can need to know the parameter values to execute well; that's usually the situation rather than the exception with partitions. You run the risk that the generic prepared plan will end up looking at all the partitions, because at preparation plan time it can't figure out which can be excluded. Can only figure that out once they're in there for some types of queries. I think you aren't quite lined up with the people suggesting "test it" in terms of what that means. The idea is not that you should build a full on application test case yet, which can be very expensive. The idea is that you might explore things like "when I partition this way increasing the partitions from 1 to n, does query time go up linearly?" by measuring with fake data and a machine-generated schema. What's happened in some of these cases is that, despite the theoretical, some constant or external overhead ends up dominating behavior for lower numbers. As an example, it was recognized that the amount of statistics for a table collected with default_statistics_target had a quadratic impact on some aspects of performance. But it turned out that for the range of interesting values to most people, the measured runtime did not go up with the square as feared. Only way that was sorted out was to build a simple simulation. Here's a full example from that discussion that shows the sort of tests you probably want to try, and comments on the perils of guessing based on theory rather than testing: http://archives.postgresql.org/pgsql-hackers/2008-12/msg00601.php http://archives.postgresql.org/pgsql-hackers/2008-12/msg00687.php generate_series can be very helpful here, and you can even use that to generate timestamps if you need them in the data set. That said, anecdotally everyone agrees that partitions don't scale well into even the very low hundreds for most people, and doing multi-level ones won't necessarily normally drop query planning time--just the cost of maintaining the underlying tables and indexes. My opinion is that building a simple partitioned case and watching how the EXPLAIN plans change as you adjust things will be more instructive for you than either asking about it or reading the source. Vary the parameters, watch the plans, measure things and graph them if you want to visualize the behavior better. Same thing goes for large numbers of partial indexes, which have a similar query planning impact, but unlike partitions I haven't seen anyone analyze them via benchmarks. I'm sure you could get help here (probably the performance list is a better spot though) with getting your test case right if you wanted to try and nail that down. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg(a)2ndQuadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Vincenzo Romano on 30 Jul 2010 06:24 2010/7/29 Josh Berkus <josh(a)agliodbs.com>: > >> Or maybe checking against the source code and its documentation, if any. > > No, not really. What you really want to know is: what's the real > planner overhead of having dozens/hundreds of partial indexes? What's > the write overhead? There's no way you can derive that from the source > code faster than you can test it. Again, as the test would be rather killing for my group at this stage. I think that knowing whether certain parts have been implemented with linear or sub-linear (or whatever else) algorithms would give good insights about scalability. At a first glance it seems that for inheritance some bottleneck is hindering a full exploit for table partitioning. Is there anyone who knows whether those algorithms are linear or not? And of course, I agree that real tests on real data will provide the real thing. -- Vincenzo Romano at NotOrAnd Information Technologies Software Hardware Networking Training Support Security -- NON QVIETIS MARIBVS NAVTA PERITVS -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Josh Berkus on 30 Jul 2010 14:51
> Is there anyone who knows whether those algorithms are linear or not? Read the code? It's really very accessible, and there's lots and lots of comments. While the person who wrote the code is around, isn't it better to see the real implementation? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |