Partitioning/inherited tables vs FKs [PgSql]

Prev: [HACKERS] Partitioning/inherited tables vs FKs
Next: LD_LIBRARY_PATH versus rpath

From: Florian Pflug on 6 May 2010 05:31

On May 6, 2010, at 10:52 , Boszormenyi Zoltan wrote:
> =# create table parent (id serial primary key, t text);
> ...
> =# create table child () inherits (parent);
> ...
> =# create table refer (id serial primary key, parent_id integer
> ...
> =# insert into child (t) values ('a') returning id;
> ...
> =# select * from parent;
> id | t
> ----+---
> 1 | a
> (1 sor)
>
> =# insert into refer (parent_id) values (1);
> ERROR: insert or update on table "refer" violates foreign key
> constraint "refer_parent_id_fkey"
> DETAIL: Key (parent_id)=(1) is not present in table "parent".
>
> The use case for this was there were different news items,
> and there were another table for summaries, that could point
> to any of the news items table. Another use case could be
> a large partitioned table with an FK to the main table where
> the referring table might only contain very few "interesting" data.

Yeah, this is a long-standing issue with inheritance. Table inheritance in postgres isn't much more than an implicit UNION done on selects plus some logic in ALTER TABLE to keep propagate structural changes. Indices and constraints basically always behave as if ONLY had been specified. I'm not even sure if the ids are globally unique in your example - it might be that each child's "id serial" column gets its very own sequence.

One possible workaround is no create a table, say referred_ids, that contains all the ids from parent and all of its children, kept up-to-date via triggers, and point the FK constraint to that table. That also allows for a global unique constraint on the ids by definition a suitable unique or primary key constraint on referred_ids.

What lies at the heart of this problem is the lack of multi-table indices and hence multi-table unique constraints in postgres. AFAIK with those in place the rest amounts to the removal of ONLY from the constraint check queries plus some code to propagate constraint triggers to child tables.

best regards,
Florian Pflug

From: Jaime Casanova on 6 May 2010 06:37

2010/5/6 Boszormenyi Zoltan <zb(a)cybertec.at>:
>
> =# insert into refer (parent_id) values (1);
> ERROR: Â insert or update on table "refer" violates foreign key
> constraint "refer_parent_id_fkey"
> DETAIL: Â Key (parent_id)=(1) is not present in table "parent".
>
> The use case for this was there were different news items,
> and there were another table for summaries, that could point
> to any of the news items table. Another use case could be
> a large partitioned table with an FK to the main table where
> the referring table might only contain very few "interesting" data.
>
> No matter what are the semantics, the parent table in the
> inheritance chain cannot be used as and endpoint for FKs.
>
> Is it a bug, or intentional?

i would call it a bug, but this is a known issue

>
> The only solution currently is that the referring table has to be
> partitioned the same way as the referred table in the FK, and
> its parent table has to be queried.
>

no, you can install a trigger on the child table that verifies the
existence of the id on your partitioned parent table, the SELECT
you'll use inside that trigger will look at the entire set of tables
(as long as you don't use FROM ONLY)

also could be useful to put an index (even a PK) on every child to
ensure uniqueness and make the SELECT more efficient, and of course a
check constraint in every child emulating a partition key

--
Jaime Casanova www.2ndQuadrant.com
Soporte y capacitaciÃ³n de PostgreSQL

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 6 May 2010 08:35

On Thu, May 6, 2010 at 6:37 AM, Jaime Casanova <jaime(a)2ndquadrant.com> wrote:
> i would call it a bug, but this is a known issue
>
>>
>> The only solution currently is that the referring table has to be
>> partitioned the same way as the referred table in the FK, and
>> its parent table has to be queried.
>>
>
> no, you can install a trigger on the child table that verifies the
> existence of the id on your partitioned parent table, the SELECT
> you'll use inside that trigger will look at the entire set of tables
> (as long as you don't use FROM ONLY)
>
> also could be useful to put an index (even a PK) on every child to
> ensure uniqueness and make the SELECT more efficient, and of course a
> check constraint in every child emulating a partition key

The referential integrity triggers contain some extra magic that isn't
easily simulatable in userland, and that is necessary to make the
foreign key constraints airtight. We've discussed this previously but
I don't remember which thread it was or the details of when things
blow up. I think it's something like this: the parent has a tuple
that is not referenced by any child. Transaction 1 begins, deletes
the parent tuple (checking that it has no children), and pauses.
Transaction 2 begins, adds a child tuple that references the parent
tuple (checking that the parent exists, which it does), and commits.
Transaction 1 commits.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 6 May 2010 10:38

Florian Pflug <fgp(a)phlo.org> writes:
> What lies at the heart of this problem is the lack of multi-table
> indices and hence multi-table unique constraints in postgres. AFAIK
> with those in place the rest amounts to the removal of ONLY from the
> constraint check queries plus some code to propagate constraint
> triggers to child tables.

Well, the lack of multi-table indexes certainly is the heart of the
problem, but I'm not sure that inventing such a thing is the solution.
Quite aside from the implementation difficulties involved in it,
doing things that way would destroy some of the major reasons to
partition tables at all:

* the index grows as the size of the total data set, it's not limited
by partition size

* can't cheaply drop one partition any more, you have to vacuum the
(big) index first

* probably some other things I'm not thinking of at the moment.

I think the real solution is to upgrade the partitioning infrastructure
so that we can understand that columns are unique across the whole
partitioned table, when the partitioning is done on that column and each
partition has a unique index.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Florian Pflug on 6 May 2010 12:04

On May 6, 2010, at 16:38 , Tom Lane wrote:
> Florian Pflug <fgp(a)phlo.org> writes:
>> What lies at the heart of this problem is the lack of multi-table
>> indices and hence multi-table unique constraints in postgres. AFAIK
>> with those in place the rest amounts to the removal of ONLY from the
>> constraint check queries plus some code to propagate constraint
>> triggers to child tables.
>
> Well, the lack of multi-table indexes certainly is the heart of the
> problem, but I'm not sure that inventing such a thing is the solution.
> Quite aside from the implementation difficulties involved in it,
> doing things that way would destroy some of the major reasons to
> partition tables at all:
>
> * the index grows as the size of the total data set, it's not limited
> by partition size
>
> * can't cheaply drop one partition any more, you have to vacuum the
> (big) index first
>
> * probably some other things I'm not thinking of at the moment.
>
> I think the real solution is to upgrade the partitioning infrastructure
> so that we can understand that columns are unique across the whole
> partitioned table, when the partitioning is done on that column and each
> partition has a unique index.

True, for partitioned tables multi-table indices reintroduce some of the performance problems that partitioning is supposed to avoid.

But OTOH if you use table inheritance as a means to map data models (e.g. EER) more naturally to SQL, then multi-table indices have advantages over the partitioning-friendly solution you sketched above.

With a multi-table index, SELECT * FROM PARENT WHERE ID=?? has complexity LOG(N*M) where M is the number of tables inheriting from PARENT (including PARENT itself), and N the average number of rows in these tables. With one index per child, the complexity is M*LOG(N) which is significantly higher if M is large. Constraint exclusion could reduce that to LOG(N), but only if each child is has it's own private ID range which precludes ID assignment from a global sequence and hence makes ID assignment much more complex and error-prone.

Anyway, I was wondering why we need guaranteed uniqueness for FK relationships anyway. Because if we don't (which I didn't check prior to posting this I must admit), then why can't we simply remove the "ONLY" from the RI queries and let ALTER TABLE attach the RI triggers not only to the parent but also to all children. What am I missing?

best regards,
Florian Pflug

| Next | Last
Pages: 1 2 3 4
Prev: [HACKERS] Partitioning/inherited tables vs FKs
Next: LD_LIBRARY_PATH versus rpath