Range types [PgSql]

Prev: Winflex
Next: [HACKERS] Fast or immediate shutdown

From: Tom Lane on 15 Dec 2009 17:27

Scott Bailey <artacus(a)comcast.net> writes:
> Ok, let me give an example of what we can do with the current
> implementations that would not be possible with timestamps if we
> implement as suggested. ...
> The function below takes two period arrays that can have overlapping and
> adjacent elements. It subtracts all values in pa1 that intersect with
> values in pa2. So perhaps pa1 is all of your work shifts for the month
> and pa2 is a combination of your leave and holidays. The result is a
> coalesced non-contiguous set of the times you would actually be working.

The proposed problem is certainly soluble without any assumptions
of discreteness. The answer might not look very much like the way
you chose to code it here, but that's not an argument for adopting
a fundamentally incorrect worldview. If this were an amazingly
short and beautiful piece of code, it might support your argument,
but it's neither.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Scott Bailey on 15 Dec 2009 17:39

Tom Lane wrote:
> Jeff Davis <pgsql(a)j-davis.com> writes:
>> On Tue, 2009-12-15 at 11:49 -0800, David Fetter wrote:
>>> FWIW, I think it would be a good idea to treat timestamps as
>>> continuous in all cases.
>
>> I disagree. There is a lot of value in treating timestamp ranges as
>> discrete.
>
>> One big reason is that the ranges can be translated between the
>> different input/output forms, and there's a canonical form. As we know,
>> a huge amount of the value in an RDBMS is unifying data from multiple
>> applications with different conventions.
>
> Actually, that is exactly one of the reasons why what you propose is
> a *bad* idea. You want to institutionalize application dependence on
> a non-portable implementation detail, namely the granularity of machine
> representation of what's in principle a continuous value. That's one
> of the fastest routes to non-unifiable data I can think of.
>
>> So, let's say one application uses (] and another uses [). If you are
>> mixing the data and returning it to the application, you want to be able
>> to provide the result according to its convention. You can't do that
>> with a continuous range.
>
> The above is nonsense. [1,2) and [1,2] are simply different objects.
> A design that assumes that it is always possible to replace one by
> the other is broken so badly it's not even worth discussing.

I don't hear anyone arguing that. But you should be able to convert
between [1,2], [1,3), (0,3) and (0,2].

> The only reason you'd have applications that fail to handle both open
> and closed intervals would be if someone were to create an
> implementation that didn't support both from the outset. Which we
> need not and should not do.
>
>> And things get more interesting: if you mix (] and [), then range_union
>> will produce () and range_intersect will produce []. So now you have all
>> four conventions floating around the same database.
>
> Which is why it's a good idea to support all four...

I don't understand you then. Where do you suppose we would define the
inclusiveness for the value? At the type level, at the column level, or
at the value level? A design that allows values of different
inclusiveness and offers no means to convert from one to another is
worthless.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Scott Bailey on 15 Dec 2009 17:40

> If this were an amazingly
> short and beautiful piece of code, it might support your argument,
> but it's neither.

Well we can't all be arrogant brainiacs.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 15 Dec 2009 18:22

I wrote:
> The proposed problem is certainly soluble without any assumptions
> of discreteness.

To be concrete, I think it could be approached like this:

Assume the datatype provides a built-in function

period_except(p1 period, p2 period) returns setof period

which can return zero, one, or two rows depending on the inputs:

no rows if p1 is completely contained in p2

one row if p1 partially overlaps p2, for example:

[1,4] except [3,5] returns [1,3)
[4,6] except [1,5) returns [5,6]

two rows if p1 properly contains p2, for example

[1,10] except [4,5] returns [1,4) and (5,10]
[1,10] except [9,10) returns [1,9) and [10,10]

and of course just p1 if p1 and p2 don't overlap at all.

Given such a function it's a simple matter of successively removing each
element of p2[] from the set representing the current members of p1[].
The way that I'd find most natural to code that is a loop, along the
lines of

foreach p2_member in unnest(p2) loop
p1 := array(select period_except(p1_member, p2_member)
from unnest(p1) p1_member);
end loop;

But maybe it can be done in a single SQL command.

As this example makes clear, when dealing with continuous intervals you
*must* admit both open and closed intervals, else you don't have a way
to represent the results of "except". Maybe part of the failure to
communicate here arises from your desire to try to avoid supporting both
kinds of intervals. But I think you really have to do it if you want to
deal with data that hasn't got any natural granularity.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jeff Davis on 15 Dec 2009 18:40

On Tue, 2009-12-15 at 17:17 -0500, Tom Lane wrote:
> Actually, that is exactly one of the reasons why what you propose is
> a *bad* idea. You want to institutionalize application dependence on
> a non-portable implementation detail, namely the granularity of machine
> representation of what's in principle a continuous value. That's one
> of the fastest routes to non-unifiable data I can think of.

Based on the premise that timestamps are a continuous value and the
granularity/precision is entirely an implementation detail, you're
right. But I disagree with the premise, at least in some cases that I
think are worthwhile. See reference [1].

> The above is nonsense. [1,2) and [1,2] are simply different objects.
> A design that assumes that it is always possible to replace one by
> the other is broken so badly it's not even worth discussing.

I don't understand this point at all. [1,2) and [1,2] are different
values. Of course they are not interchangeable.

If you think I'm proposing that we drop inclusivity/exclusivity before
telling the application, that's not what I'm proposing at all. I'm
proposing that, at least in some circumstances, it's important to be
able to display the same value in different formats -- e.g. [1, 3) or
[1, 2], depending on what the application expects. Similar to a timezone
adjustment.

Regards,
Jeff Davis

[1] "Temporal Data and the Relational Model" by C.J. Date, et al., uses
discrete time throughout the entire book, aside from a brief discussion
at the beginning.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Prev: Winflex
Next: [HACKERS] Fast or immediate shutdown