Prev: [HACKERS] DB crash SOS
Next: pg_dump does not honor namespaces when functions are used in index
From: Robert Haas on 6 Jul 2010 13:49 On Thu, Jun 17, 2010 at 9:34 PM, Takahiro Itagaki <itagaki.takahiro(a)oss.ntt.co.jp> wrote: > Jaime Casanova <jaime(a)2ndquadrant.com> wrote: >> This one, doesn't apply to head anymore... please update > > Thank you for reviewing my patch! > > I attached an updated patch set for partitioning syntax. I've taken a little bit more of a look at this patch and I guess I'm not too happy with the design. 1. It seems to me that the proposed design for pg_partition is poorly thought out. In particular, I don't see how this would work if we wanted to partition on multiple keys, which is a feature supported by both Oracle and MySQL. It would also be nice to give at least some thought to how we might handle partitioning by list with subpartitioning by range or hash, or range partitioning with subpartitioning by hash. We certainly don't need to do subpartitioning in the first version of the patch, but I think we should have a plan. 2. I am still of the view that the first version of this patch should correctly handle routing of INSERT and COPY data to the correct partition. But at a very minimum we need to have a plan for how we're going to implement that in a follow-on patch. I think the way to do this is to binary search a sorted array of partition keys (perhaps upper bounds for range partitioning, and exact values for list partitioning). When you find the correct key, then you find the index of that key and look up that same index in a separate array of table OIDs and insert there. While it's possible to construct such a structure from the proposed catalog structure, it requires an index scan. I'm wondering if it might be better to abandon the idea of storing the partition values in pg_inherits and instead put preconstructed arrays directly into pg_partition. That way, with a single row fetch, you can get all the data you need. I'm not sure this is better, though - other opinions? 3. For a first version of this patch, I would suggest that we only allow partitioning by base columns, rather than expressions. When someone goes to do a bulk load of data into the table, and we want to do automatic tuple routing, we're going to have to evaluate the partitioning expression(s) for every row. I'm just guessing here, but I bet it's a lot cheaper to fetch an attribute by attnum than to evaluate an arbitrary expression. So even if we add partitioning by expression later, I don't think that the work to make a special case for base columns will be wasted. 4. The dependency handling needs more thought. For example, if I do this: create table names (id integer primary key, name varchar not null) partition by range (id) (partition names_1 values less than 1000, partition names_2 values less than 2000, partition names_3 values less than 3000); and then I drop names_2, the automatically generated constraint on names_3 still says CHECK (2000 <= id AND id < 3000). And I can drop those constraints off the individual tables, too, which doesn't seem like it ought to be allowed. And if I do something like this: create or replace function f(int) returns int as $$select $1$$ immutable language sql; create table names (id integer primary key, name varchar not null) partition by range (f(id)) (partition names_1 values less than 1000, partition names_2 values less than 2000, partition names_3 values less than 3000); alter table names_1 drop constraint names_1_id_check; alter table names_2 drop constraint names_2_id_check; alter table names_3 drop constraint names_3_id_check; drop function f(int); ....then I get: ERROR: cannot drop function f(integer) because other objects depend on it DETAIL: range partition depends on function f(integer) ....but there's no identification of which range partition depends on it. The locking is also broken here. In session #1, start a transaction and do CREATE PARTITION on an existing partitioned table. Then in session #2, do DROP FUNCTION <some function> CASCADE in a manner that leads to the range partition getting dropped. Then commit session #1. Now you have a pg_inherits catalog with leftovers in inhvalues. 5. The use of the term "partition" is not very consistent. For example, we use CREATE PARTITION to create a partition, but we use DROP TABLE to get rid of it (there is no DROP PARTITION). I think that the right syntax to use here is ALTER TABLE ... ADD/DROP PARTITION; both Oracle and MySQL do it that way. And meanwhile OCLASS_PARTITION means "the partitioning information associated with the parent table", not "a partition of a parent table". 6. There's some kind of magic in here associated with indexes on the parent table - it seems that matching indexes or primary keys are automatically created on each child table. But there's no provision for keeping them in sync. If I create a partitioned table with a primary key, the key is inherited by all its current children. If I then drop the primary key, it disappears from the parent but it still exists on the children. Any new children created afterwards don't have it, however. I'm not sure whether indices should propagate from parent to child or not, but propagating whatever exists at the moment of creation and then forgetting about it doesn't seem right. 7. I'm not convinced that it's a good idea to treat ALTER TABLE parent ATTACH/DETACH PARTITION child as basically a synonym for ALTER TABLE child [NO] INHERIT parent, but even if it is the current implementation seems way too permissive (it also lacks comments and adequate documentation). You can, for example, use ATTACH PARTITION to add a new child and then NO INHERIT to detach it again; or you can use INHERIT to attach a child even when the parent is partitioned. It does however catch the case of trying to use ATTACH PARTITION to attach a child to an unpartitioned parent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Takahiro Itagaki on 7 Jul 2010 02:14 Robert Haas <robertmhaas(a)gmail.com> wrote: > I've taken a little bit more of a look at this patch and I guess I'm > not too happy with the design. Thanks. I was thinking about only syntax for partitioning in the patch, but I need more consideration about insert-aware catalog design. > 5. The use of the term "partition" is not very consistent. For > example, we use CREATE PARTITION to create a partition, but we use > DROP TABLE to get rid of it (there is no DROP PARTITION). I think > that the right syntax to use here is ALTER TABLE ... ADD/DROP > PARTITION; both Oracle and MySQL do it that way. And meanwhile > OCLASS_PARTITION means "the partitioning information associated with > the parent table", not "a partition of a parent table". "ALTER TABLE ... ADD/DROP PARTITION" was discussed many times, but I cannot solve syntax confict with "ALTER TABLE ... ADD [COLUMN]". Since we can omit COLUMN, parser treats "ADD PARTITION" as adding a column named "PARTITION". We need to add PARTITION into the reserved keyword list to avoid shift/reduce errors. Do you have any better idea? Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 7 Jul 2010 08:57 On Wed, Jul 7, 2010 at 2:14 AM, Takahiro Itagaki <itagaki.takahiro(a)oss.ntt.co.jp> wrote: >> 5. The use of the term "partition" is not very consistent. �For >> example, we use CREATE PARTITION to create a partition, but we use >> DROP TABLE to get rid of it (there is no DROP PARTITION). �I think >> that the right syntax to use here is ALTER TABLE ... ADD/DROP >> PARTITION; both Oracle and MySQL do it that way. And meanwhile >> OCLASS_PARTITION means "the partitioning information associated with >> the parent table", not "a partition of a parent table". > > "ALTER TABLE ... ADD/DROP PARTITION" was discussed many times, > but I cannot solve syntax confict with "ALTER TABLE ... ADD [COLUMN]". > Since we can omit COLUMN, parser treats "ADD PARTITION" as adding > a column named "PARTITION". We need to add PARTITION into the reserved > keyword list to avoid shift/reduce errors. > > Do you have any better idea? No, I think we're going to need to at least partially reserve that keyword. However, SQL:2003 and SQL:2008 apparently have it as a reserved keyword, so I'm hoping we can get away with that. I don't think it's worth inventing a totally different (and, IMHO, not very appealing) syntax just to avoid reserving a keyword that is reserved in the standard. http://www.postgresql.org/docs/current/static/sql-keywords-appendix.html -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 15 Jul 2010 08:18 On Tue, 2010-07-06 at 13:49 -0400, Robert Haas wrote: > 1. It seems to me that the proposed design for pg_partition is poorly > thought out. In particular, I don't see how this would work if we > wanted to partition on multiple keys, which is a feature supported by > both Oracle and MySQL. It would also be nice to give at least some > thought to how we might handle partitioning by list with > subpartitioning by range or hash, or range partitioning with > subpartitioning by hash. We certainly don't need to do > subpartitioning in the first version of the patch, but I think we > should have a plan. Or at least a way to store that information if/when it exists later. > 2. I am still of the view that the first version of this patch should > correctly handle routing of INSERT and COPY data to the correct > partition. But at a very minimum we need to have a plan for how we're > going to implement that in a follow-on patch. I think the way to do > this is to binary search a sorted array of partition keys (perhaps > upper bounds for range partitioning, and exact values for list > partitioning). When you find the correct key, then you find the index > of that key and look up that same index in a separate array of table > OIDs and insert there. While it's possible to construct such a > structure from the proposed catalog structure, it requires an index > scan. I'm wondering if it might be better to abandon the idea of > storing the partition values in pg_inherits and instead put > preconstructed arrays directly into pg_partition. That way, with a > single row fetch, you can get all the data you need. I'm not sure > this is better, though - other opinions? Agreed that it is really important. The heart of partitioning is the metadata that will allow us to do insert routing as well as nested joins using dynamic routing. We *must* plan for that so that the command syntax and catalog storage delivers what is required. This patch must not be just about syntax. The required usage drives the syntax, not the other way around. > 3. For a first version of this patch, I would suggest that we only > allow partitioning by base columns, rather than expressions. When > someone goes to do a bulk load of data into the table, and we want to > do automatic tuple routing, we're going to have to evaluate the > partitioning expression(s) for every row. I'm just guessing here, but > I bet it's a lot cheaper to fetch an attribute by attnum than to > evaluate an arbitrary expression. So even if we add partitioning by > expression later, I don't think that the work to make a special case > for base columns will be wasted. Agree that part should come out for now and resubmit as a later patch. Lets keep it simple in the first version. > 5. The use of the term "partition" is not very consistent. For > example, we use CREATE PARTITION to create a partition, but we use > DROP TABLE to get rid of it (there is no DROP PARTITION). I think > that the right syntax to use here is ALTER TABLE ... ADD/DROP > PARTITION; both Oracle and MySQL do it that way. And meanwhile > OCLASS_PARTITION means "the partitioning information associated with > the parent table", not "a partition of a parent table". Definitely do not want CREATE PARTITION. ALTER TABLE is the best place. > 6. There's some kind of magic in here associated with indexes on the > parent table - it seems that matching indexes or primary keys are > automatically created on each child table. But there's no provision > for keeping them in sync. If I create a partitioned table with a > primary key, the key is inherited by all its current children. If I > then drop the primary key, it disappears from the parent but it still > exists on the children. Any new children created afterwards don't > have it, however. I'm not sure whether indices should propagate from > parent to child or not, but propagating whatever exists at the moment > of creation and then forgetting about it doesn't seem right. IMHO it should be optional as to whether all partitions have identical indexing. It is an important aspect of the design that an historical table may have different indexes on different parts of the table, since different users/use cases exist for access to that data. No problem if some people want that though. > 7. I'm not convinced that it's a good idea to treat ALTER TABLE parent > ATTACH/DETACH PARTITION child as basically a synonym for ALTER TABLE > child [NO] INHERIT parent, but even if it is the current > implementation seems way too permissive (it also lacks comments and > adequate documentation). You can, for example, use ATTACH PARTITION > to add a new child and then NO INHERIT to detach it again; or you can > use INHERIT to attach a child even when the parent is partitioned. It > does however catch the case of trying to use ATTACH PARTITION to > attach a child to an unpartitioned parent. Agreed. If all we are doing is adding synonyms for existing feature then its not good enough. We need a new syntax that does not need to be backwards compatible, allowing various code streamlining and more targeting to the desired use case. Inheritance != partitioning. Similar, maybe, but not identical. Probably also the only way we can move forwards without breaking all the existing user code in subtle ways. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 15 Jul 2010 11:35 Simon Riggs <simon(a)2ndQuadrant.com> writes: > Agreed. If all we are doing is adding synonyms for existing feature then > its not good enough. We need a new syntax that does not need to be > backwards compatible, allowing various code streamlining and more > targeting to the desired use case. Inheritance != partitioning. Similar, > maybe, but not identical. Probably also the only way we can move > forwards without breaking all the existing user code in subtle ways. My feeling about it is that partitioning should be a subset of inheritance --- that is, a partitioned table is an inheritance tree, but with additional constraints/properties/catalog information. In the case at hand, that means that you couldn't use ALTER TABLE INHERIT to install a new partition, but only because it would fail to provide the additional information needed (partition key info). ALTER TABLE ATTACH PARTITION is like INHERIT except it also provides the extra partitioning info. OTOH, DETACH PARTITION is not really significantly different from ALTER NO INHERIT --- you could allow them to be used interchangeably. Though I'd still favor keeping them separate just for consistency of the DDL language. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
First
|
Prev
|
Pages: 1 2 Prev: [HACKERS] DB crash SOS Next: pg_dump does not honor namespaces when functions are used in index |