WIP: generalized index constraints [PgSql]

Prev: corrupted double-linked list
Next: Timestamp to time_t

From: Jeff Davis on 16 Sep 2009 13:42

On Wed, 2009-09-16 at 10:14 +0300, Peter Eisentraut wrote:
> What they should be called is generalized unique constraints, without
> reference to "index". Because what they generalize is the operator by
> which uniqueness is determined.

How about GUC, for short? ;-)

Do you think that Tomás's suggestion of "proximity constraints" would
satisfy your requirement on the basis that "proximity" is a more general
form of the word "unique"?

> First, we have so far been fairly consistent to document that unique
> indexes are an implementation detail of unique constraints and should
> usually not be used directly. This new approach basically reverses that
> and forces you to define your constraints by means of implementation
> details rather than a logical description. There is nothing in this
> feature that makes it strikingly different from the existing constraint
> types in a way that would prevent a reasonable syntax for defining the
> constraint at table definition time. Another problem this would lead to
> is that a say dump of a table definition wouldn't actually contain all
> the constraints that apply to the table anymore, because there might be
> additional stuff such as this that can't be expressed that way.

Those are all good points. I think ultimately we need to support this at
table creation time and make index specification optional.

We do need to allow the user to specify the index, however. An important
use case of this feature is defining an index on (A, B, C) and using it
to enforce UNIQUE(A,B) and UNIQUE(A,C).

> CREATE TABLE circles (c circle UNIQUE ON gist &&);

Should we use USING instead of ON to be consistent with CREATE INDEX?

Also, right now a UNIQUE by itself creates a btree. Let's say the user
has btree_gist installed and they declare (a =, b =) without specifying
the AM. Which one do we choose?

I suppose we can come up with a default order, like btree, hash, gist,
gin; and just choose the first one with a matching opclass.

Also, we should decide whether UNIQUE(a,b) should be shorthand for (a =,
b =), or whether they should be treated differently somehow. If so,
we'll need to come up with some kind of rules for how it determines that
UNIQUE chooses to use a btree with indisunique enforcement; and we need
to come up with a way to force it to choose the new enforcement
mechanism in my patch if the user wants to.

> CREATE TABLE data (a int UNIQUE ON btree =);

I think we should provide some way for the user to specify what
enforcement mechanism is used when multiple options are possible. In
this example should it use the existing mechanism or the new mechanism?

> ALTER TABLE circles ADD CONSTRAINT circles_idx_constr (c &&) USING INDEX
> circles_idx;
>
> doesn't seem very intuitive about what is actually being constrained.
> For a while I was thinking that it was constraining the table to values
> that are in the index or something. So using a word such as UNIQUE
> would help explain what is going on.

I'm still uncomfortable using the word UNIQUE. I see that you are taking
an approach similar to ORDER BY ... USING. However, I'm concerned
because we're using a special case word to describe the general feature,
and I think that's confusing.

On the other hand, it's hard to argue that (a &&, b =) is intuitive, so
I'll acquiesce if you get some agreement from someone else.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jeff Davis on 16 Sep 2009 13:48

On Tue, 2009-09-15 at 22:52 +1000, Brendan Jurd wrote:
> I'm just getting started reviewing this version now. I noticed that
> your patch seems to have been generated by git.

Ok, I now have a public git repo on git.postgresql.org, and I rebased my
patch before I pushed it.

See updates in my "generalized-constraints" branch. I have psql support
now, so it might be slightly easier to review.

The language and name of the feature are going through a little turmoil
right now, as you can see, so I'm trying to keep up with that. As that
settles down I'll improve the docs.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 16 Sep 2009 13:54

On Wed, Sep 16, 2009 at 3:14 AM, Peter Eisentraut <peter_e(a)gmx.net> wrote:
> On Tue, 2009-09-15 at 12:37 -0400, Robert Haas wrote:
>> Instead of calling these generalized index constraints, I wonder if we
>> oughtn't to be calling them something like "don't-overlap constraints"
>> (that's a bad name, but something along those lines). They're not
>> really general at all, except compared to uniqueness constraints (and
>> they aren't called generalized unique-index constraints, just
>> generalized index constraints).
>
> What they should be called is generalized unique constraints, without
> reference to "index". Because what they generalize is the operator by
> which uniqueness is determined.

Well, it should eventually be possible to use this feature to create
an index which excludes overlapping ranges in fact, unless I
misunderstand, that's the principle likely use case. Which is not
unique-ness at all.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: tomas on 16 Sep 2009 15:40

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, Sep 16, 2009 at 09:45:52AM -0700, Jeff Davis wrote:
> On Wed, 2009-09-16 at 15:11 +0200, tomas(a)tuxteam.de wrote:
> > One question: does the operator have to be reflexive? I.e. "A op A holds
> > for all A"?
>
> I don't think that reflexivity is a strict requirement. You could make
> this a constraint over a boolean attribute such that false conflicts
> with true and true conflicts with false. That would mean that your table
> would have to consist of either all false or all true.

Let me see whether I've understood this: more in general, op could be
"not equal" -- i.e. <>. Then, once one value was inserted into the
column, all other values would have to be equal.

[...]

> That's an interesting idea: "proximity constraint". I like it because
> (a) "proximity" might reasonably be considered a more general form of
> the word "unique", which might satisfy Peter's argument; (b) it conveys
> the use case; and (c) it sounds good.

Yes, "riding" on this geometric metaphor (distance), I was trying to
visualize relations which are symmetric but not refexive and came up
with things like

"point X is at least d1 far, at most d2 far from Y"

i.e. Y are all points which are whithin a ring centered on Y, with inner
radius d1 and outer radius d2. Special cases would be d1=0, d2>0 (that's
conventional proximity) -- but there are weirder cases, as your example
above (d2 possibly infinite, d1 small, giving a "punctured space").

> There are a couple bizarre cases where "proximity" doesn't quite fit,
> like my boolean example above, but I'm OK with that.

Hmmm.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFKsT88Bcgs9XrR2kYRAlmcAJ9+rP7AkimXRPoKGaBoJkthX2LzggCfTWst
KF/XMRouhlEbQcORaeoIbc0=
=BBEF
-----END PGP SIGNATURE-----

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jeff Davis on 19 Sep 2009 13:48

I think we have a reasonable consensus around the name "operator
exclusion constraints", Robert Haas's suggestion. I am OK with that
name, and it got support from David Fetter and Tom Lane. As David Fetter
said, it's useful for the name to hint at the API.

Peter had some reasonable objections to that name, but the word "unique"
just doesn't cut it for this feature. My feature allows constraints
which are more restrictive than a unique constraint; but the final straw
was after a discussion with Tomás in which we determined that you can
also define constraints which are the opposite of unique: all values
must be the same (by using <> as the operator*).

I agree with Peter that we should support creating these constraints at
table creation time. This can be supported with the following syntax:

CONSTRAINT foo_constr (a <op>, ...)
{ USING INDEX foo_idx | USING method }

and it's also a more declarative syntax for the ALTER TABLE case, and
prevents a series of other problems that Peter pointed out.

There's an important unresolved question with this patch that I need to
address, which just came to light: what about functional/expression
indexes?

Say you have a table foo(a text, b text) and an index on:
((a||b)::circle)

You could define an operator constraint like:
((a||b)::circle &&)

and that would be sane. But I suppose I should also allow any expression
with the same signature, like:
((b||a)::circle &&)

[ not a very realistic example, but it seems like it may be useful ]

Does that make sense? Does someone have a better idea? Am I missing
other issues here?

How do I test if two functions/expressions:
a. are identical?
b. have matching signatures?

Regards,
Jeff Davis

*: Understandably, there is no strategy for <> for most data types.
However, if your constraint is that all values must be the same, it's
quite reasonable to add one and be able to use an index to quickly find
values that are different.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: corrupted double-linked list
Next: Timestamp to time_t