Prev: Winflex
Next: [HACKERS] Fast or immediate shutdown
From: Scott Bailey on 16 Dec 2009 13:07 Jeff Davis wrote: > On Sun, 2009-12-13 at 23:49 -0800, Scott Bailey wrote: >> So basically I have an anyrange pseudo type with the functions prev, >> next, last, etc defined. So instead of hard coding range types, we would >> allow the user to define their own range types. Basically if we are able >> to determine the previous and next values of the base types we'd be able >> to define a range type. I'm envisioning in a manner much like defining >> an enum type. > > After an off-list discussion with Scott, I think there may be a solution > here that works for everyone if we don't try so hard to unify the > implementation of discrete and continuous ranges. The API should be very > similar, of course, but the implementation doesn't need to be. > > Continuous ranges absolutely require the following information: start, > end, and inclusivity information. > > But discrete ranges can instead be stored by counting the number of > granules from the start point. For instance, it could be stored as: > start, num_granules. > > That has a lot of benefits for discrete ranges of time. First of all, it > allows the algebra to work reasonably well for the "days" and "months" > part of the interval, so we can allow a granule of 1 day/week/month/year > for a timestamp range. For output of the range, we can then just > multiply the granule by the number of granules, and add that to the > start time; thus avoiding the "incremental addition" problem with date > math. I think this works reasonably well for timestamp/date ranges -- > let me know if there is a problem here (aside from timestamptz, which I > address below). > > Secondly, in the case of a timestamp range, we can use 7 bytes for > storing the number of granules rather than another full 8-byte > timestamp, leaving one byte for flags to represent NULL boundaries, > infinite boundaries, etc. For timestamps that would still mean that an > interval could be 2000 years long with '1 microsecond' granularity. For > dates, 3 bytes is sufficient for a date range 45000 years long with > granules of '1 day'. That means that we can get back down to a 16 byte > representation for timestamp ranges, or 8 byte representation for date > ranges. There are a few details, like infinite ranges, but those can be > pretty easily solved with flags as well. > > There's one problem, and that's for timestamptz ranges with intervals > that include days and months. Timezone adjustments are just not > well-defined for that kind of granule (nor would it be particularly > useful even if it magically worked), so this would have to be blocked > somehow. I think that's a special case, and we could provide the user > with a nice error message telling the user to use a date or timestamp > range instead. > > So, the idea is to default to a continuous range type, but if the user > supplies a granule, prior and next functions, and other necessary > details, then it becomes a discrete range type. > > * continuous ranges can still have everything that everyone wants, > including flags to indicate special values. > * discrete range granule is specified explicitly, so it's not an > "implementation detail" > * discrete ranges can have a compact representation > * discrete ranges would still have room for flags to indicate special > values > > Comments? As I pointed out off-list, I think the granularity for timestamp range should be limited to hours and smaller. Anything larger is asking for trouble. And quite honestly if they wanted day granularity, they should use date range. Also, I think the granule should be same type as returned when subtracting two subtypes. So granule of date range should be int not interval. And if user wanted something with month granularity, perhaps an enum range of 'YYYYMM' would be better. Quite honestly the following 3 cases would probably meet 99% of need: CREATE TYPE period AS RANGE(timestamptz(0), interval '1 s'); CREATE TYPE period AS RANGE(timestamptz(3), interval '1 ms'); CREATE TYPE period AS RANGE(timestamptz, interval '1 us'); -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Scott Bailey on 16 Dec 2009 13:21 Tom Lane wrote: > Jeff Davis <pgsql(a)j-davis.com> writes: >> [ hacky special-case representation for discrete timestamp ranges ] > > I'm still not exactly clear on what the use-case is for discrete > timestamp ranges, and I wonder how many people are going to be happy > with a representation that can't handle a range that's open-ended > on the left. They wouldn't. But the timestamp data would be the anchor, not necessarily the start point. As long as we ranges unbounded on both ends we'd be ok. -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Scott Bailey on 16 Dec 2009 13:57 Ok, silly question here. But how do you determine the length of a continuous range? By definition length of [a, b) and (a, b] = b-a. But what about (a,b) and [a,b]? Are we saying that because they are continuous, the difference between values included in the range and those excluded are so infinitesimally small so as not to matter? Thus length (a,b) == length [a,b] == length [a,b)? And if that is the case, does the inclusiveness of the range really even matter? And can anyone point me to a reference for working with continuous ranges? Google just insists that I spelled contiguous wrong. Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 16 Dec 2009 13:59 Scott Bailey <artacus(a)comcast.net> writes: > As I pointed out off-list, I think the granularity for timestamp range > should be limited to hours and smaller. Anything larger is asking for > trouble. And quite honestly if they wanted day granularity, they should > use date range. I'm still not real clear on what the expected use-case is for this. You're evidently envisioning applications where the allowed form of an interval is constrained, but in the cases I can think of, the constraints are a lot more convoluted than what you're proposing. For example, if you're trying to do classroom scheduling, it might be useful to constrain the periods to start and end on hour boundaries --- but the next thing you'll want is to have it know that the "next" slot after 5pm Friday is 8am Monday. Except on holidays. And then there's the fact that my alma mater starts most hour-long classes on the half hour. I think that wiring such constraints into the low-level datatype is doomed to failure. What you'd be better off with is a function that finds the "next" period given a current period and some suitable representation of the off-limits intervals. The argument for having granularity wired into the datatype seems to boil down to just space savings. I don't find that compelling enough to justify code contortions and user-visible restrictions on functionality. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Jeff Davis on 16 Dec 2009 14:25
On Wed, 2009-12-16 at 12:42 -0500, Robert Haas wrote: > On Wed, Dec 16, 2009 at 12:31 PM, Jeff Davis <pgsql(a)j-davis.com> wrote: > > There's one problem, and that's for timestamptz ranges with intervals > > that include days and months. Timezone adjustments are just not > > well-defined for that kind of granule (nor would it be particularly > > useful even if it magically worked), so this would have to be blocked > > somehow. I think that's a special case, and we could provide the user > > with a nice error message telling the user to use a date or timestamp > > range instead. > > This seems like a fairly special-purpose type. You'd be targeting it > at people who are very concerned with storing large numbers of these > (so they really care about space consumption) but for some reason > don't need to mix days and months (actually, the current interval > representation stores days, months, and seconds separately). I > certainly think this might be useful to some people but it doesn't > really sounds like a general range type facility, since it seems to > involve some hacks that are fairly datatype-specific. My statement should have read "days or months". In other words, you can't have a timestamptz range with a granularity of '3 days'. But if that's your granularity, as Scott says, you should be using a date range, not a timestamptz range. Timestamptz ranges are only really useful when you have a granularity measured in seconds (or some fraction or multiple thereof). Otherwise, the timezone adjustment doesn't make any sense. So this isn't a case of limited functionality, just that we need to inform the user that a timestamptz range with granularity '1 day' or '1 month' makes no sense. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |