Prev: Winflex
Next: [HACKERS] Fast or immediate shutdown
From: Jeff Davis on 15 Dec 2009 16:25 On Tue, 2009-12-15 at 13:15 -0500, Tom Lane wrote: > You probably need some flag bits anyway, so flailing frantically to > avoid that doesn't seem like a profitable use of time. I think "need" and "flailing" are both a little too strong here. The biggest use case will almost certainly be ranges of timestamps, and most of those people will have no use for flag bits (or if they do, it might not be worth an 8-byte-per-value overhead). > One pretty obvious use for a flag bit is open-ended ranges, ie > range(something, infinity) > You could only do this without a flag bit if the underlying datatype > has an "infinity" value, which not all do. True, but frustrating for people whose primary use case is timestamps or floats, which already have 'infinity'. Also frustrating for people who don't mind using the min/max integer as a workaround. > I'm also wondering what null range boundaries would do. Maybe that's > the same as the infinity case, or maybe not. I would prefer to avoid allowing NULL range boundaries for the following reasons: * it reminds me of MySQL dates with zeros in them * we are not guided by any kind of standard * we'd have to invent semantics that are pretty far outside of those defined for mathematical intervals * we could easily create more confusion or allow subtle traps for users * we aren't guided by a clear use case (or I haven't seen it) that isn't equally solvable (or better) using some other method * would impose that extra 8-byte storage burden on people who may not need any flag bits Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 15 Dec 2009 16:52 Jeff Davis <pgsql(a)j-davis.com> writes: > I think "need" and "flailing" are both a little too strong here. The > biggest use case will almost certainly be ranges of timestamps, and most > of those people will have no use for flag bits (or if they do, it might > not be worth an 8-byte-per-value overhead). When the alternatives are a crippled implementation that might not do what I want at all, or a full-featured implementation that takes another 8 bytes per value, I'll take the second. The 8 bytes don't matter if it doesn't solve my problem. > I would prefer to avoid allowing NULL range boundaries for the following > reasons: > * it reminds me of MySQL dates with zeros in them If we use that notation to represent an open-ended interval, it seems perfectly reasonable to me. And it doesn't require any assumptions about whether the underlying type has an infinity. I think it's a seriously bad idea to tell people that they should depend on min or max values of a datatype to substitute for the lack of open-ended intervals. That sort of technique creates unnecessary implementation dependencies, and magic numbers (especially ones with a dozen or two digits in them) are bad things for readability in any case. To take just one example that's pretty near at hand: if type date had had an exact published max value that applications were hard-wiring into their code, we'd not have been able to change it to add 'infinity' as a special value. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Scott Bailey on 15 Dec 2009 17:01 David Fetter wrote: > On Tue, Dec 15, 2009 at 11:31:05AM -0800, Scott Bailey wrote: >> Jeff Davis wrote: >>> On Tue, 2009-12-15 at 10:19 -0500, Tom Lane wrote: >> Would it be OK if we handled float timestamp ranges as continuous >> and int64 timestamps discrete? > > That sounds like a recipe for disaster. Whatever timestamp ranges > are, float and int64 should be treated the same way so as not to get > "surprises" due to implementation details. > >> You effectively lose the ability to build non-contiguous sets with >> continuous ranges. Which is integral to the work I'm doing (union, >> intersect, coalesce and minus sets of ranges) >> >> As for the extra bits, would it be better to just require continuous >> ranges to be either [] or [)? But I don't know which would be >> preferred. My inclination would be toward [), but Tom seemed to >> indicate that perhaps [] was the norm. > > [] makes certain operations--namely the important ones in > calendaring--impossible, or at least incredibly kludgy, to do. I > think we ought to leave openness at each end up to the user, > independent of the underlying implementation details. > > FWIW, I think it would be a good idea to treat timestamps as > continuous in all cases. Ok, let me give an example of what we can do with the current implementations that would not be possible with timestamps if we implement as suggested. Jeff's implementation uses a 1 microsecond step size or granule. And my implementation uses an interval step size and can be configured database wide, but default is 1 second. The function below takes two period arrays that can have overlapping and adjacent elements. It subtracts all values in pa1 that intersect with values in pa2. So perhaps pa1 is all of your work shifts for the month and pa2 is a combination of your leave and holidays. The result is a coalesced non-contiguous set of the times you would actually be working. But to do this kind of thing you need to be able to determine prior, first, last and next. I need an implementation that can do this for timestamps and not just ints and dates. CREATE OR REPLACE FUNCTION period_minus( pa1 IN period[], pa2 IN period[] ) RETURNS period[] AS $$ SELECT array_agg(prd) FROM ( SELECT period((t_in).start_time, MIN((t_out).end_time)) AS prd FROM ( SELECT DISTINCT first(p) AS start_time FROM unnest($1) p WHERE NOT contains($2, first(p)) AND NOT contains($1, prior(p)) UNION SELECT DISTINCT next(p) FROM unnest($2) p WHERE contains($1, next(p)) AND NOT contains($2, next(p)) ) t_in JOIN ( SELECT next(p) AS end_time FROM unnest($1) p WHERE NOT contains($1, next(p)) UNION ALL SELECT first(p) FROM unnest($2) p WHERE contains($1, first(p)) AND NOT contains($2, prior(p)) ) t_out ON t_in.start_time < t_out.end_time GROUP BY t_in.start_time ORDER BY t_in.start_time ) sub; $$ LANGUAGE 'sql' IMMUTABLE STRICT; -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Jeff Davis on 15 Dec 2009 17:02 On Tue, 2009-12-15 at 11:49 -0800, David Fetter wrote: > That sounds like a recipe for disaster. Whatever timestamp ranges > are, float and int64 should be treated the same way so as not to get > "surprises" due to implementation details. It's a compile-time option that will change the semantics of timestamps regardless. Anyone who changes between float and int64 timestamps may experience problems for a number of different reasons. What was unique before might no longer be unique, for instance. > FWIW, I think it would be a good idea to treat timestamps as > continuous in all cases. I disagree. There is a lot of value in treating timestamp ranges as discrete. One big reason is that the ranges can be translated between the different input/output forms, and there's a canonical form. As we know, a huge amount of the value in an RDBMS is unifying data from multiple applications with different conventions. So, let's say one application uses (] and another uses [). If you are mixing the data and returning it to the application, you want to be able to provide the result according to its convention. You can't do that with a continuous range. And things get more interesting: if you mix (] and [), then range_union will produce () and range_intersect will produce []. So now you have all four conventions floating around the same database. Another reason you might mix conventions: say you have log data from several sources, and some sources provide timestamps for an event which is essentially "instantaneous" and other sources will log the period of time over which the event occurred, in [) format. To mix the data coherently, the correct thing to do is call the instantaneous points a singleton range; but the only way to represent a singleton continuous range is by using []. Whenever you mix conventions, you either have to be able to change the format (which is only possible with discrete ranges) or teach the application to understand your convention. And if you don't have a canonical form (which is only possible with discrete ranges), you can't reliably compare values for equality, or see if they are adjacent. Saying that discrete ranges are unnecessary is essentially saying that there's only a use for one convention; or that the conventions will never be mixed; or that applications will always be smart enough to sort it out for themselves. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 15 Dec 2009 17:17
Jeff Davis <pgsql(a)j-davis.com> writes: > On Tue, 2009-12-15 at 11:49 -0800, David Fetter wrote: >> FWIW, I think it would be a good idea to treat timestamps as >> continuous in all cases. > I disagree. There is a lot of value in treating timestamp ranges as > discrete. > One big reason is that the ranges can be translated between the > different input/output forms, and there's a canonical form. As we know, > a huge amount of the value in an RDBMS is unifying data from multiple > applications with different conventions. Actually, that is exactly one of the reasons why what you propose is a *bad* idea. You want to institutionalize application dependence on a non-portable implementation detail, namely the granularity of machine representation of what's in principle a continuous value. That's one of the fastest routes to non-unifiable data I can think of. > So, let's say one application uses (] and another uses [). If you are > mixing the data and returning it to the application, you want to be able > to provide the result according to its convention. You can't do that > with a continuous range. The above is nonsense. [1,2) and [1,2] are simply different objects. A design that assumes that it is always possible to replace one by the other is broken so badly it's not even worth discussing. The only reason you'd have applications that fail to handle both open and closed intervals would be if someone were to create an implementation that didn't support both from the outset. Which we need not and should not do. > And things get more interesting: if you mix (] and [), then range_union > will produce () and range_intersect will produce []. So now you have all > four conventions floating around the same database. Which is why it's a good idea to support all four... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |