Range types [PgSql]

Prev: Winflex
Next: [HACKERS] Fast or immediate shutdown

From: Jeff Davis on 16 Dec 2009 15:35

On Wed, 2009-12-16 at 13:59 -0500, Tom Lane wrote:
> The argument for having
> granularity wired into the datatype seems to boil down to just space
> savings. I don't find that compelling enough to justify code
> contortions and user-visible restrictions on functionality.

The argument (at least from me) is that discrete ranges have better
semantics. The counterargument was that the granularity of a timestamp
is an implementation detail. So I countered by making it explicit.

Space savings is not crucial, but it would be frustrating to needlessly
waste space.

I still have not seen an answer to the problem of changing the
representation of a continuous range. If you have the continuous range
[5, 10], you're pretty much stuck with that representation, even if the
application is expecting things in the form [ ).

>From an application's standpoint, you probably want to get the
information about a range as separate columns (as opposed to parsing a
string). So an application expecting data in [) format might use a query
like:

select ..., first(myperiod), next(myperiod) from mytable;

That gives the application complete information about the range. You can
even make a view over a table like that to make it even more transparent
to the application. It's not entirely unreasonable that many such
applications exist; there are many presentations and tutorials that have
been telling people to use a "start" and "end" column, and assume that
the start is inclusive and the end is exclusive.

If there is some other application that expects data in (] format, you
just use the query:

select ..., prior(myperiod), last(myperiod) from mytable;

With discrete ranges, that all just works (barring overflow or special
values).

With continuous ranges, first() or next() might fail on some values that
were produced by some other application. Really, for continuous ranges,
you'd need to have a query like:

select ..., start(myperiod), start_inclusive(myperiod),
end(myperiod), end_inclusive(myperiod) from mytable;

in order to have all of the information. And that means that the
application needs a full implementation of a range type to understand
the inclusivity and produce a correct result.

And to further make the case for allowing user-defined discrete ranges,
what about ip4r? That's a discrete range, and it's user-defined. And
that probably means other useful discrete ranges will be invented, as
well.

If we want to say that there is no discrete TIMESTAMP range by default,
and that the superuser has to define it, that's one thing. But if we say
that the only acceptable base types for discrete ranges will be
hard-wired, that's way too restrictive. If nothing else, it makes some
system types "special" which we have not done very many other places.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Alvaro Herrera on 16 Dec 2009 15:41

Tom Lane wrote:
> Alvaro Herrera <alvherre(a)commandprompt.com> writes:
> > In short, I think that while it is possible to define ranges of strings,
> > it is not as useful as one would like.
>
> Note it is not the *range* that is the problem, it is the assumption
> that there's a unique "next" string. There's no unique next in the
> reals or rationals either, but we have no problem considering intervals
> over those sets.

Yeah, agreed. It's easy (I think) to define more useful ranges of
strings if you don't insist in having "next".

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 16 Dec 2009 15:42

Martijn van Oosterhout <kleptog(a)svana.org> writes:
> But a period type will take just one or two more bytes if you don't
> require alignment. Alignment on a varlena type seems silly anyway,
> since you'll be aligning the header byte rather than the content.

You might still end up paying the alignment overhead after the field,
of course. But avoiding embedding the alignment in the type itself
seems worth doing.

One idea that might be interesting to consider in this regard is
force-packing varlena range values. Consider a text range('a', 'b').
The datums are likely to come in with 4-byte headers requiring
alignment. If we have the smarts to force them to 1-byte header form
inside the varlena range value, not only do we save bytes right there,
but we don't have to expend cycles to copy them somewhere to re-align
them before we can pass them to the datatype-specific functions.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 16 Dec 2009 15:46

Jeff Davis <pgsql(a)j-davis.com> writes:
> On Wed, 2009-12-16 at 12:50 -0500, Tom Lane wrote:
>> I'm still not exactly clear on what the use-case is for discrete
>> timestamp ranges, and I wonder how many people are going to be happy
>> with a representation that can't handle a range that's open-ended
>> on the left.

> Huh? We're miscommunicating somewhere.

Yeah, apparently. By open-ended I meant -infinity left bound, or null
left bound if you prefer. Not sure if there's a better term.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 16 Dec 2009 15:57

Jeff Davis <pgsql(a)j-davis.com> writes:
> On Wed, 2009-12-16 at 13:59 -0500, Tom Lane wrote:
>> The argument for having
>> granularity wired into the datatype seems to boil down to just space
>> savings. I don't find that compelling enough to justify code
>> contortions and user-visible restrictions on functionality.

> The argument (at least from me) is that discrete ranges have better
> semantics. The counterargument was that the granularity of a timestamp
> is an implementation detail. So I countered by making it explicit.

Making it explicit doesn't fix the fact that you can't rely on the
arithmetic to be exact.

> I still have not seen an answer to the problem of changing the
> representation of a continuous range. If you have the continuous range
> [5, 10], you're pretty much stuck with that representation, even if the
> application is expecting things in the form [ ).

That is not our problem. It's the application's problem if it can't
handle the concept. You might as well be complaining that type numeric
is broken because it can represent values that will fail to fit into
float8 when some application tries to force them into that form.

> And to further make the case for allowing user-defined discrete ranges,
> what about ip4r?

What about it? I don't have a problem with the concept that next() is
well defined for some datatypes. I just have a problem with the concept
that it's well-defined for timestamps. It's not, and I don't believe
that forcing it to have some definition is useful in the real world
(which, as a rule, isn't going to fit the simplifying assumptions you
have to make to make it even sort-of work).

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Prev: Winflex
Next: [HACKERS] Fast or immediate shutdown