extended operator classes vs. type interfaces [PgSql]

Prev: a faster compression algorithm for pg_dump
Next: [HACKERS] How to modify default Type (TSQuery) behaviour?

From: Dimitri Fontaine on 9 Apr 2010 04:10

Hi,

First, I like the way you got back to the needs before trying to
organize an approach to find a solution. Having said it allows me to cut
a lot of your text, it's the one I agree with :)

Robert Haas <robertmhaas(a)gmail.com> writes:
> Given a type T, I think we'd like to be able to define a type U as
> "the natural type to be added to or subtracted from T". As Jeff
> pointed out to me, this is not necessarily the same as the underlying
> type. For example, if T is a timestamp, U is an interval; if T is a
> numeric, U is also a numeric; if T is a cidr, U is an integer. Then
> we'd like to define a canonical addition operator and a canonical
> subtraction operator. I think that would be sufficient for the needs
> of RANGE BETWEEN ... PRECEDING AND ... FOLLOWING. It would also be
> nearly sufficient for range types, but in that case you also need to
> specify the unit increment within U - i.e. a "1" value for the
> datatype. It may or may not be worth building the concept of a unit
> increment into the type interface machinery, though: one could imagine
> two different range types built over the same base type with different
> unit increments - e.g. one timestamp range with unit increment = 1s,
> and one with unit increment = 1m. Under the first type [4pm,5pm) =
> [4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm].

Do we want to enable support for string based ranges, as in the
contributed prefix_range type?

> Thoughts?

I like the type interface approach and I think this concept has been
studied in great details in math and that we should start from existing
concepts, even if most of them are way over my head.

The ORDER BY problem refers to a metric space, defined by a distance
function. Continuing your proposal the distance function return type
would be of domain U. KNNGist is then a way to use the GiST index to
sort by distance.

http://archives.postgresql.org/pgsql-hackers/2010-02/msg01107.php

You'll see in this mail a proposal for an operator group notion, which
could get renamed to type interface if we think we won't need rings and
such rather than just groups in the future. And there's opportunity for
multi-type interfaces too (think families), like what's the distance
between a point and a circle?

The math groups already have a notion of neutral element, which for the
addition is 0 (zero), we could expand our version of it with a "unity"
element, which would be in the T domain.

Then the range type could expand on this and provide a different unity
value in their own interface, in the U domain this time. IMO tying the
precision of the range interval into the type interface is a bad
abstraction. As you said we want to possibly have several ranges types
atop this.

We can say that [1,3] = [1,4) when considering a "default" integer range
because 4-3 = unity(integer). When considering a range over timestamps
with a range interval unity of 1s, we are still able to do the math, and
we can have another range over timestamps with a range interval unity of
10 mins in the same database. (I'm using this later example with the
period datatype in a real application).

While speaking of all that, in the prefix_range case, it'd be useful to
have a new kind of typemod system at the range level, to allow for
defining prefix text range with the '/' separator, say. Then

greater_prefix('/etc/bar', '/etc/baz') = '/etc' (or is it '/etc/'?)

Whereas currently

=> select '/etc/baz'::prefix_range | '/etc/bar';
?column?
--------------
/etc/ba[r-z]
(1 row)

Regards,
--
dim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Yeb Havinga on 9 Apr 2010 07:55

Robert Haas wrote:
> Under the first type [4pm,5pm) =
> [4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm].
>
> Thoughts?
>
The examples with units look a lot like the IVL<PQ> datatype from HL7,
see
http://www.hl7.org/v3ballot/html/infrastructure/datatypes_r2/datatypes_r2.htm

About a type interface, the HL7 spec talks about promotion from e.g. a
timestamp to an interval (hl7 speak for range) of timestamps (a range),
and demotion for the back direction. Every 'quantity type', which is any
type with a (possibly partially) lineair ordered domain, can be promoted
to an interval of that type. In PostgreSQL terms, this could perhaps
mean that by 'tagging' a datatype as a lineair order, it could
automatically have a range type defined on it, like done for the array
types currently.

regards,
Yeb Havinga

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 9 Apr 2010 10:33

On Fri, Apr 9, 2010 at 7:55 AM, Yeb Havinga <yebhavinga(a)gmail.com> wrote:
> Robert Haas wrote:
>>
>> Under the first type [4pm,5pm) =
>> [4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm].
>>
>> Thoughts?
>>
>
> The examples with units look a lot like the IVL<PQ> datatype from HL7, see
> http://www.hl7.org/v3ballot/html/infrastructure/datatypes_r2/datatypes_r2.htm
>
> About a type interface, the HL7 spec talks about promotion from e.g. a
> timestamp to an interval (hl7 speak for range) of timestamps (a range), and
> demotion for the back direction. Every 'quantity type', which is any type
> with a (possibly partially) lineair ordered domain, can be promoted to an
> interval of that type. In PostgreSQL terms, this could perhaps mean that by
> 'tagging' a datatype as a lineair order, it could automatically have a range
> type defined on it, like done for the array types currently.

The way we've handled array types is, quite frankly, horrible. It's
bad enough that we now have two catalog entries in pg_type for each
base type; what's even worse is that if we actually wanted to enforce
things like the number of array dimensions we'd need even more - say,
seven per base type, one for the base type itself, one for a
one-dimensional array, one for a two-dimensional array, one for a
three-dimensional array. And then if we want to support range types
that's another one for every base type, maybe more if there's more
than one kind of range over a base type. It's just not feasible to
handle derived types in a way that require a new instance of each base
type to be created for each kind of derived type. It scales as
O(number of base types * number of kinds of derived type), and that
rapidly gets completely out of hand

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 9 Apr 2010 10:34

On Fri, Apr 9, 2010 at 10:33 AM, Robert Haas <robertmhaas(a)gmail.com> wrote:
> On Fri, Apr 9, 2010 at 7:55 AM, Yeb Havinga <yebhavinga(a)gmail.com> wrote:
>> Robert Haas wrote:
>>>
>>> Under the first type [4pm,5pm) =
>>> [4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm].
>>>
>>> Thoughts?
>>>
>>
>> The examples with units look a lot like the IVL<PQ> datatype from HL7, see
>> http://www.hl7.org/v3ballot/html/infrastructure/datatypes_r2/datatypes_r2.htm
>>
>> About a type interface, the HL7 spec talks about promotion from e.g. a
>> timestamp to an interval (hl7 speak for range) of timestamps (a range), and
>> demotion for the back direction. Every 'quantity type', which is any type
>> with a (possibly partially) lineair ordered domain, can be promoted to an
>> interval of that type. In PostgreSQL terms, this could perhaps mean that by
>> 'tagging' a datatype as a lineair order, it could automatically have a range
>> type defined on it, like done for the array types currently.
>
> The way we've handled array types is, quite frankly, horrible. It's
> bad enough that we now have two catalog entries in pg_type for each
> base type; what's even worse is that if we actually wanted to enforce
> things like the number of array dimensions we'd need even more - say,
> seven per base type, one for the base type itself, one for a
> one-dimensional array, one for a two-dimensional array, one for a
> three-dimensional array. And then if we want to support range types
> that's another one for every base type, maybe more if there's more
> than one kind of range over a base type. It's just not feasible to
> handle derived types in a way that require a new instance of each base
> type to be created for each kind of derived type. It scales as
> O(number of base types * number of kinds of derived type), and that
> rapidly gets completely out of hand

....which by the way, doesn't mean that your idea is bad (although it
might not be what I would choose to do), just that I don't think our
current infrastructure can support it.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 9 Apr 2010 10:48

On Fri, Apr 9, 2010 at 4:10 AM, Dimitri Fontaine <dfontaine(a)hi-media.com> wrote:
> Do we want to enable support for string based ranges, as in the
> contributed prefix_range type?

Yes, probably, but that doesn't require as much knowledge of the
underlying data type, so I didn't feel it needed to be brought up in
this context. There is no x such that ['a','b') = ['a',x]; it's
generally impossible to convert between open and closed intervals in
this type of range type. That's the case where type interfaces are
needed; if you're not converting between different kinds of intervals
then you can probably get by with the existing system of using the
default btree opclass to find equality and comparison operators.

> I like the type interface approach and I think this concept has been
> studied in great details in math and that we should start from existing
> concepts, even if most of them are way over my head.

I'm not too excited about patterning this too closely after
mathematical concepts; I think we need to have a pragmatic approach
that focuses on what the database actually needs. We need to think
generally enough about what we're trying to provide that we don't box
ourselves into a corner, but we're not trying to build a
theorem-prover.

> You'll see in this mail a proposal for an operator group notion, which
> could get renamed to type interface if we think we won't need rings and
> such rather than just groups in the future. And there's opportunity for
> multi-type interfaces too (think families), like what's the distance
> between a point and a circle?

Yeah, that needs some thought.

> The math groups already have a notion of neutral element, which for the
> addition is 0 (zero), we could expand our version of it with a "unity"
> element, which would be in the T domain.

I don't know what that would mean in this case. We're trying to add
and subtract from T, so a unit or identity element makes sense for U,
but not for T.

> Then the range type could expand on this and provide a different unity
> value in their own interface, in the U domain this time. IMO tying the
> precision of the range interval into the type interface is a bad
> abstraction. As you said we want to possibly have several ranges types
> atop this.

Right - so I think there's no point in specifying this in the type
interface at all. We can always add it later if we find a real need
for it.

> We can say that [1,3] = [1,4) when considering a "default" integer range
> because 4-3 = unity(integer). When considering a range over timestamps
> with a range interval unity of 1s, we are still able to do the math, and
> we can have another range over timestamps with a range interval unity of
> 10 mins in the same database. (I'm using this later example with the
> period datatype in a real application).
>
> While speaking of all that, in the prefix_range case, it'd be useful to
> have a new kind of typemod system at the range level, to allow for
> defining prefix text range with the '/' separator, say. Then
>
> greater_prefix('/etc/bar', '/etc/baz') = '/etc' (or is it '/etc/'?)
>
> Whereas currently
>
> => select '/etc/baz'::prefix_range | '/etc/bar';
> ?column?
> --------------
> /etc/ba[r-z]
> (1 row)

Not sure I'm really following this.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

| Next | Last
Pages: 1 2 3 4 5 6 7
Prev: a faster compression algorithm for pg_dump
Next: [HACKERS] How to modify default Type (TSQuery) behaviour?