Range types [PgSql]

Prev: Winflex
Next: [HACKERS] Fast or immediate shutdown

From: Jeff Davis on 16 Dec 2009 14:29

On Wed, 2009-12-16 at 12:50 -0500, Tom Lane wrote:
> I'm still not exactly clear on what the use-case is for discrete
> timestamp ranges, and I wonder how many people are going to be happy
> with a representation that can't handle a range that's open-ended
> on the left.

Huh? We're miscommunicating somewhere. Discrete ranges are values, and
those values can be displayed a number of different ways. That's one of
the biggest advantages.

The very same discrete range can be displayed as open-open, closed-open,
open-closed, or closed-closed. There are edge cases, like how infinity
is never closed, and overflow conditions. But generally speaking, you
have more options for presenting a discrete range than a continuous
range.

The range [5, 7) is equal to the set {5, 6} and equal to the ranges:
(4,7), (4,6], [5,7), and [5,6]. One application can insert it as [5,7)
and another can read it as (4,6].

That's the use case: the application's preferences don't have to match.
It's OK to mix various representation preferences, because you can
convert between them. The on disk format happens to hint at one
particular canonical form, but doesn't enforce that on anyone.

> Huh? You're not going to be able to have a special case data
> representation for one or two data types at the same time as you have a
> function-based datatype-independent concept of a parameterized range
> type. Well, maybe you could have special code paths for just date and
> timestamp but it'd be horrid.

They aren't supposed to be exactly the same API, I said that from the
start. There are API differences between continuous and discrete ranges,
and we shouldn't ignore them.

One important differences is that (barring overflow conditions and
special values) prior, first, last, and next are defined for all
discrete range values, but not for all continuous range values.

For instance, the discrete range [5,7) has prior=4, first=5, last=6,
next=7.

Whereas the continuous range [5,7) has prior=undef, first=5, last=undef,
next=7.

We could define one API, that treats discrete and continuous ranges
differently. But you'll never be able to transform a continuous range to
a different representation, while you can do so with a discrete range.

> More importantly, the notion of a representation granule is still 100%
> wishful thinking for any inexact-representation datatype, which is going
> to be a severe crimp in getting this accepted for timestamp, let alone
> defining it in a way that would allow users to try to apply it to
> floats. Float timestamps might not be the default case anymore but they
> are still supported.

If the only objection is that a superuser can confuse the system by
poorly defining a range type on a non-default build, I think that
objection can be overcome.

> I think you should let go of the feeling that you have to shave bytes
> off the storage format. You're creating a whole lot of work for
> yourself and a whole lot of user-visible corner cases in return for
> what ultimately isn't much.

This isn't just to shave bytes. It's also because I like the semantics
of discrete ranges for many cases.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Alvaro Herrera on 16 Dec 2009 14:45

tomas(a)tuxteam.de wrote:

> (and as Andrew Dunstan pointed out off-list: I was wrong with my bold
> assertion that one can squeeze infinitely many (arbitrary length)
> strings between two given. This is not always the case).

Of course you can do that if you assume lexicographical order, or any
other arbitrary order. The interesting point is whether there exists
some ordering on which this does not happen. And in fact there is:
order strings by length first, and then lexicographically. If you do
this then you have next() and prev() for any given string. If you use
ASCII only, you have a.next = b, and so on.

There is the argument that some languages do not sort lexicographically
but this is also besides the point -- you only need to find *some* way
to sort the characters in the alphabet. If you dictate that in your
ordering "�" comes before "�" and both after "a", and all of them before
b, then you know that a.next = � and �.next = � and �.next = b. (Note
that I have also dictated that there is no other character that sorts
after a and before b, which is perfectly possible because the alphabet
is fixed for any given language. You could use the complete list of
characters coded in a given set of Unicode planes, or even extant all
planes, to obtain the same result).

Defining strings with this ordering means you can have some useful
ranges like [a-z], but then you cannot meaningfully use ranges for
things like [aardvark - zulu] --- note that in this particular example,
the order is reversed, because zulu comes before aardvark which is
probably not what you want. zzz.next = aaaa

In short, I think that while it is possible to define ranges of strings,
it is not as useful as one would like.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Martijn van Oosterhout on 16 Dec 2009 15:07

On Tue, Dec 15, 2009 at 04:29:26PM -0800, Jeff Davis wrote:
> On Tue, 2009-12-15 at 18:06 -0600, decibel wrote:
> > Now that varlena's don't have an enormous fixed overhead, perhaps it's
> > worth looking at using them. Obviously some operations would be
> > slower, but for your stated examples of auditing and history, I
> > suspect that you're not going to notice the overhead that much.
>
> For most varvarlena types, you only get stuck with the full alignment
> burden if you get unlucky. In this case, we're moving from 16 bytes to
> 17, which really means 24 bytes with alignment. Try creating two tables:
>
> create table foo(i int8, t1 timestamp, t2 timestamp);
> create table bar(i int8, c "char", t1 timestamp, t2 timestamp);

But a period type will take just one or two more bytes if you don't
require alignment. Alignment on a varlena type seems silly anyway,
since you'll be aligning the header byte rather than the content.

In the implementation you may need to copy the content before
processing to satisfy the alignment of the contained type, but that's
just a SMOP.

Have a nice day,
--
Martijn van Oosterhout <kleptog(a)svana.org> http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

From: Martijn van Oosterhout on 16 Dec 2009 15:24

On Wed, Dec 16, 2009 at 10:57:19AM -0800, Scott Bailey wrote:
> Ok, silly question here. But how do you determine the length of a
> continuous range? By definition length of [a, b) and (a, b] = b-a. But
> what about (a,b) and [a,b]? Are we saying that because they are
> continuous, the difference between values included in the range and
> those excluded are so infinitesimally small so as not to matter? Thus
> length (a,b) == length [a,b] == length [a,b)? And if that is the case,
> does the inclusiveness of the range really even matter?

Short answer: Yes

Longer answer: You need to decide on your definition of "length" and
what you usually use is the "measure". And yes, the difference between
the two is so called "measure 0" and thus has no effect on the length.

Note the measure has to be done considering the intervals as intervals
on a real line. The integers by themselves have no measure (they are
countable). So for the "length" of a set of integers you might consider
the count of the set.

http://planetmath.org/encyclopedia/ProofThatTheOuterLebesgueMeasureOfAnIntervalIsItsLength.html
http://en.wikipedia.org/wiki/Outer_measure

As for "continuous", as you use it above is not a way I recognise.
There are contiguous sets, but they are something else.

Have a nice day,
--
Martijn van Oosterhout <kleptog(a)svana.org> http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

From: Tom Lane on 16 Dec 2009 15:26

Alvaro Herrera <alvherre(a)commandprompt.com> writes:
> In short, I think that while it is possible to define ranges of strings,
> it is not as useful as one would like.

Note it is not the *range* that is the problem, it is the assumption
that there's a unique "next" string. There's no unique next in the
reals or rationals either, but we have no problem considering intervals
over those sets.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Prev: Winflex
Next: [HACKERS] Fast or immediate shutdown