Range types [PgSql]

Prev: Winflex
Next: [HACKERS] Fast or immediate shutdown

From: Scott Bailey on 16 Dec 2009 13:07

Jeff Davis wrote:
> On Sun, 2009-12-13 at 23:49 -0800, Scott Bailey wrote:
>> So basically I have an anyrange pseudo type with the functions prev,
>> next, last, etc defined. So instead of hard coding range types, we would
>> allow the user to define their own range types. Basically if we are able
>> to determine the previous and next values of the base types we'd be able
>> to define a range type. I'm envisioning in a manner much like defining
>> an enum type.
>
> After an off-list discussion with Scott, I think there may be a solution
> here that works for everyone if we don't try so hard to unify the
> implementation of discrete and continuous ranges. The API should be very
> similar, of course, but the implementation doesn't need to be.
>
> Continuous ranges absolutely require the following information: start,
> end, and inclusivity information.
>
> But discrete ranges can instead be stored by counting the number of
> granules from the start point. For instance, it could be stored as:
> start, num_granules.
>
> That has a lot of benefits for discrete ranges of time. First of all, it
> allows the algebra to work reasonably well for the "days" and "months"
> part of the interval, so we can allow a granule of 1 day/week/month/year
> for a timestamp range. For output of the range, we can then just
> multiply the granule by the number of granules, and add that to the
> start time; thus avoiding the "incremental addition" problem with date
> math. I think this works reasonably well for timestamp/date ranges --
> let me know if there is a problem here (aside from timestamptz, which I
> address below).
>
> Secondly, in the case of a timestamp range, we can use 7 bytes for
> storing the number of granules rather than another full 8-byte
> timestamp, leaving one byte for flags to represent NULL boundaries,
> infinite boundaries, etc. For timestamps that would still mean that an
> interval could be 2000 years long with '1 microsecond' granularity. For
> dates, 3 bytes is sufficient for a date range 45000 years long with
> granules of '1 day'. That means that we can get back down to a 16 byte
> representation for timestamp ranges, or 8 byte representation for date
> ranges. There are a few details, like infinite ranges, but those can be
> pretty easily solved with flags as well.
>
> There's one problem, and that's for timestamptz ranges with intervals
> that include days and months. Timezone adjustments are just not
> well-defined for that kind of granule (nor would it be particularly
> useful even if it magically worked), so this would have to be blocked
> somehow. I think that's a special case, and we could provide the user
> with a nice error message telling the user to use a date or timestamp
> range instead.
>
> So, the idea is to default to a continuous range type, but if the user
> supplies a granule, prior and next functions, and other necessary
> details, then it becomes a discrete range type.
>
> * continuous ranges can still have everything that everyone wants,
> including flags to indicate special values.
> * discrete range granule is specified explicitly, so it's not an
> "implementation detail"
> * discrete ranges can have a compact representation
> * discrete ranges would still have room for flags to indicate special
> values
>
> Comments?

As I pointed out off-list, I think the granularity for timestamp range
should be limited to hours and smaller. Anything larger is asking for
trouble. And quite honestly if they wanted day granularity, they should
use date range. Also, I think the granule should be same type as
returned when subtracting two subtypes. So granule of date range should
be int not interval. And if user wanted something with month
granularity, perhaps an enum range of 'YYYYMM' would be better.

Quite honestly the following 3 cases would probably meet 99% of need:
CREATE TYPE period AS RANGE(timestamptz(0), interval '1 s');
CREATE TYPE period AS RANGE(timestamptz(3), interval '1 ms');
CREATE TYPE period AS RANGE(timestamptz, interval '1 us');

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Scott Bailey on 16 Dec 2009 13:21

Tom Lane wrote:
> Jeff Davis <pgsql(a)j-davis.com> writes:
>> [ hacky special-case representation for discrete timestamp ranges ]
>
> I'm still not exactly clear on what the use-case is for discrete
> timestamp ranges, and I wonder how many people are going to be happy
> with a representation that can't handle a range that's open-ended
> on the left.

They wouldn't. But the timestamp data would be the anchor, not
necessarily the start point. As long as we ranges unbounded on both ends
we'd be ok.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Scott Bailey on 16 Dec 2009 13:57

Ok, silly question here. But how do you determine the length of a
continuous range? By definition length of [a, b) and (a, b] = b-a. But
what about (a,b) and [a,b]? Are we saying that because they are
continuous, the difference between values included in the range and
those excluded are so infinitesimally small so as not to matter? Thus
length (a,b) == length [a,b] == length [a,b)? And if that is the case,
does the inclusiveness of the range really even matter?

And can anyone point me to a reference for working with continuous
ranges? Google just insists that I spelled contiguous wrong.

Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 16 Dec 2009 13:59

Scott Bailey <artacus(a)comcast.net> writes:
> As I pointed out off-list, I think the granularity for timestamp range
> should be limited to hours and smaller. Anything larger is asking for
> trouble. And quite honestly if they wanted day granularity, they should
> use date range.

I'm still not real clear on what the expected use-case is for this.
You're evidently envisioning applications where the allowed form of
an interval is constrained, but in the cases I can think of, the
constraints are a lot more convoluted than what you're proposing.
For example, if you're trying to do classroom scheduling, it might be
useful to constrain the periods to start and end on hour boundaries
--- but the next thing you'll want is to have it know that the "next"
slot after 5pm Friday is 8am Monday. Except on holidays. And then
there's the fact that my alma mater starts most hour-long classes on
the half hour.

I think that wiring such constraints into the low-level datatype is
doomed to failure. What you'd be better off with is a function that
finds the "next" period given a current period and some suitable
representation of the off-limits intervals. The argument for having
granularity wired into the datatype seems to boil down to just space
savings. I don't find that compelling enough to justify code
contortions and user-visible restrictions on functionality.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jeff Davis on 16 Dec 2009 14:25

On Wed, 2009-12-16 at 12:42 -0500, Robert Haas wrote:
> On Wed, Dec 16, 2009 at 12:31 PM, Jeff Davis <pgsql(a)j-davis.com> wrote:
> > There's one problem, and that's for timestamptz ranges with intervals
> > that include days and months. Timezone adjustments are just not
> > well-defined for that kind of granule (nor would it be particularly
> > useful even if it magically worked), so this would have to be blocked
> > somehow. I think that's a special case, and we could provide the user
> > with a nice error message telling the user to use a date or timestamp
> > range instead.
>
> This seems like a fairly special-purpose type. You'd be targeting it
> at people who are very concerned with storing large numbers of these
> (so they really care about space consumption) but for some reason
> don't need to mix days and months (actually, the current interval
> representation stores days, months, and seconds separately). I
> certainly think this might be useful to some people but it doesn't
> really sounds like a general range type facility, since it seems to
> involve some hacks that are fairly datatype-specific.

My statement should have read "days or months". In other words, you
can't have a timestamptz range with a granularity of '3 days'. But if
that's your granularity, as Scott says, you should be using a date
range, not a timestamptz range.

Timestamptz ranges are only really useful when you have a granularity
measured in seconds (or some fraction or multiple thereof). Otherwise,
the timezone adjustment doesn't make any sense.

So this isn't a case of limited functionality, just that we need to
inform the user that a timestamptz range with granularity '1 day' or '1
month' makes no sense.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Prev: Winflex
Next: [HACKERS] Fast or immediate shutdown