WIP: generalized index constraints [PgSql]

Prev: corrupted double-linked list
Next: Timestamp to time_t

From: Brendan Jurd on 13 Sep 2009 05:08

2009/8/21 Brendan Jurd <direvus(a)gmail.com>:
> 2009/8/21 Jeff Davis <pgsql(a)j-davis.com>:
>> On Fri, 2009-08-21 at 12:23 +1000, Brendan Jurd wrote:
>>> The current behaviour seems to be predicated on the unique constraint
>>> being an integral part of the index itself. While this might be true
>>> from a system catalog point of view (pg_index.indisunique), if a user
>>> says that they want to copy a table's structure INCLUDING INDEXES
>>> EXCLUDING CONSTRAINTS then IMO they've made their intention perfectly
>>> clear. They'd expect it to create an index sans the unique
>>> constraint. Ignoring the user's intention and copying the index as-is
>>> (including the unique constraint) would be unfriendly.
>>
>> I don't have strong feelings either way. I think that's probably a
>> separate patch, and a fairly small patch.
>>
>
> Yeah, as I was writing the above I was thinking that it might end up a
> separate patch. However I was also concerned that it might be less
> disruptive if we implement your patch with the less-astonishing
> behaviour and fix the unique index case in passing, than to commit
> your patch with the bad behavior and then fix both.
>
> Up to you.
>

Hi Jeff,

Any update on this patch? The discussion appeared to trail off around
21 Aug with some inconclusive thoughts about handling the corner cases
in CREATE TABLE LIKE.

The September CF starts in a couple of days, so this patch is in
danger of missing the boat.

The unresolved points seem to be:

* What to do about INCLUDING INDEXES EXCLUDING CONSTRAINTS --
Postgres gets this wrong for unique indexes currently. Should we
persist with the existing behaviour or fix it as part of this patch?
My personal feeling was +1 for fixing it in this patch.

* Should we emit some sort of message when the user specifies
INCLUDING INDEXES or INCLUDING CONSTRAINTS but not both? I didn't
have strong feelings about this one but there was some differing
thoughts about what log level to use. I thought NOTICE but Alvaro
reckons we've got too many of those already. Tom mentioned the
suggested (but unimplemented) NOVICE level, which seems like a good
move but doesn't resolve the problem of what to do in this patch. One
option would be to add a message at the NOTICE level with a TODO to
downgrade it to NOVICE if/when that becomes available.

Cheers,
BJ

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jeff Davis on 14 Sep 2009 05:33

On Sun, 2009-09-13 at 19:08 +1000, Brendan Jurd wrote:
> The September CF starts in a couple of days, so this patch is in
> danger of missing the boat.

Thanks for keeping track. I accomplished a significant amount today, so
there's still hope for 9/15.

I will most likely just focus on the core functionality so that I have
something complete and reviewable.

> The unresolved points seem to be:
>
> * What to do about INCLUDING INDEXES EXCLUDING CONSTRAINTS --
> Postgres gets this wrong for unique indexes currently. Should we
> persist with the existing behaviour or fix it as part of this patch?
> My personal feeling was +1 for fixing it in this patch.

I don't think that it should make a difference whether "EXCLUDING
CONSTRAINTS" is specified or omitted. There is no "[INCLUDING|EXCLUDING]
CONSTRAINTS" option in the standard, but for the other LIKE options,
EXCLUDING is implied when INCLUDING is not specified.

So, I think we have to make a decision:
1. If INCLUDING CONSTRAINTS is specified, but not INCLUDING INDEXES,
do we: copy the indexes silently; or emit a nice message; or throw
an ERROR?
2. What if INCLUDING INDEXES is specified, but not INCLUDING
CONSTRAINTS?

> * Should we emit some sort of message when the user specifies
> INCLUDING INDEXES or INCLUDING CONSTRAINTS but not both? I didn't
> have strong feelings about this one but there was some differing
> thoughts about what log level to use. I thought NOTICE but Alvaro
> reckons we've got too many of those already. Tom mentioned the
> suggested (but unimplemented) NOVICE level, which seems like a good
> move but doesn't resolve the problem of what to do in this patch. One
> option would be to add a message at the NOTICE level with a TODO to
> downgrade it to NOVICE if/when that becomes available.

I don't think either of these things are a huge amount of work; they are
mostly just decisions that need to be made. I'll start off implementing
whatever is easiest/cleanest, and we'll continue the discussion.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jeff Davis on 15 Sep 2009 04:46

On Sun, 2009-09-13 at 19:08 +1000, Brendan Jurd wrote:
> Any update on this patch?

Attached is the latest version.

Changes:

* Merged with HEAD
* Changed from storing the information in pg_index to pg_constraint.
This required rewriting a large portion of the patch, so it's not a
clean diff from the previous patch.
* Implemented the language using ALTER TABLE to add the constraint, as
discussed (with a slight change to avoid the extra "INDEX" keyword).
* Added docs
* Added tests
* All relations with relindconstraints == 0 do not pass through the
index constraints enforcement code at all. I did this a little
differently than what I laid out in the design, but the idea is the
same and it should avoid any problems.

That's the good news. The bad news:

* No pg_dump support yet (shouldn't be too hard)
* Creating a new constraint does not check the existing data for
conflicts.
* Doesn't work like the new deferrable unique constraints yet (also
shouldn't be too hard).
* I didn't make any changes to the behavior of LIKE (also not hard).
* Can't be specified at CREATE INDEX time. I don't think this is a
showstopper, and it will take some significant effort. The advantage
of allowing this is that the constraints can be checked more quickly
(in theory) while building the index, and it also might be handy
shorthand. However, it suffers from a number of problems:
1. Extra syntax that is almost entirely redundant.
2. More work.
3. The performance gains will probably be pretty marginal. We have
to do N index scans anyway; the savings would only be due to
the better caching impact and the fact that the index in the
process of being built is smaller than an already-built index.
So, right now I'm not in a hurry to fix this last point.

I realize that some of the things missing make the patch uncomittable in
its current form. However, I would still appreciate a review of what I
have ready.

Regards,
Jeff Davis

From: Brendan Jurd on 15 Sep 2009 08:52

2009/9/15 Jeff Davis <pgsql(a)j-davis.com>:
> Attached is the latest version.

Hi Jeff,

I'm just getting started reviewing this version now. I noticed that
your patch seems to have been generated by git. Are you hosting this
work on a public repo somewhere that I can pull from? Also I think
the committers generally prefer context diffs (pipe it through
"filterdiff --format=context --strip=1") in submissions.

Regarding the documentation updates, I think you might want to add
some commentary to Chapter 11: Indexes -- perhaps add a new section
after 11.6 Unique Indexes to talk about general index constraints,
and/or update the wording of 11.6 to reflect your changes.

The paragraph explaining index_constraints in ALTER TABLE may be a
little bit confusing.

<quote>
ADD index_constraint

This form adds a new index constraint to the table which is
enforced by the given index. The operator provided must be usable as a
search strategy for the index, and it also must detect conflicts
symmetrically. The semantics are similar to a unique index but it
opens up new possibilities. A unique index can only detect conflicts
when the two values are equal, and that behavior is equivalent to an
index constraint where all operators are the equality operator.

However, an index constraint allows you to use other operators as
well, such as the overlaps operator provided for the circle data type.
The index constraint will ensure that no two circles overlap. See
example below.
</quote>

My eyes started to cross in the second sentence. "Detect conflicts
symmetrically"? I have actually *used* this feature successfully in
testing the patch, and I still don't know quite what to make of that
phrase. You might need to dumb it down.

It might also be good to be a bit more explicit about the way the
choice of operators works. It is the inverse of the logic used to
express an ordinary value constraint. E.g., when you use the equality
operator in an index constraint you are in effect saying that new rows
MUST NOT satisfy this operator for any existing rows.

I'll continue reviewing and post more comments on the code itself shortly.

Cheers,
BJ

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Brendan Jurd on 15 Sep 2009 09:21

2009/9/15 Jeff Davis <pgsql(a)j-davis.com>:
> Attached is the latest version.
>

The new error message for a conflict is:

ERROR: index constraint violation detected
DETAIL: tuple conflicts with existing data

How about also including the name of the constraint (or index) that
was violated? I could imagine this error message being frustrating
for someone who had a table with multiple index constraints, as they
wouldn't know which one had raised the conflict.

Also, the DETAIL part should be written as a full sentence with
leading capital and full stop [1], see

I deliberately tried to create an index constraint using a bogus
operator, to see what would happen:

postgres=# alter table circles add constraint circles_overlap (c <->)
using index circle_idx;
ERROR: no strategy found for operator 1520 in operator family 2595

The error message is pretty unfriendly, but I'm ambivalent about
whether it's worth doing anything about this particular case.

One of the comments I made in my original review [2] was that "\d" in
psql should show the constraint. I don't think you've addressed this
in the current version.

Cheers,
BJ

[1] http://www.postgresql.org/docs/current/static/error-style-guide.html
[2] http://archives.postgresql.org/message-id/37ed240d0907152222w7ccfc13i8ce8d11a0c517e8(a)mail.gmail.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

| Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: corrupted double-linked list
Next: Timestamp to time_t