Prev: Winflex
Next: [HACKERS] Fast or immediate shutdown
From: Jeff Davis on 16 Dec 2009 14:29 On Wed, 2009-12-16 at 12:50 -0500, Tom Lane wrote: > I'm still not exactly clear on what the use-case is for discrete > timestamp ranges, and I wonder how many people are going to be happy > with a representation that can't handle a range that's open-ended > on the left. Huh? We're miscommunicating somewhere. Discrete ranges are values, and those values can be displayed a number of different ways. That's one of the biggest advantages. The very same discrete range can be displayed as open-open, closed-open, open-closed, or closed-closed. There are edge cases, like how infinity is never closed, and overflow conditions. But generally speaking, you have more options for presenting a discrete range than a continuous range. The range [5, 7) is equal to the set {5, 6} and equal to the ranges: (4,7), (4,6], [5,7), and [5,6]. One application can insert it as [5,7) and another can read it as (4,6]. That's the use case: the application's preferences don't have to match. It's OK to mix various representation preferences, because you can convert between them. The on disk format happens to hint at one particular canonical form, but doesn't enforce that on anyone. > Huh? You're not going to be able to have a special case data > representation for one or two data types at the same time as you have a > function-based datatype-independent concept of a parameterized range > type. Well, maybe you could have special code paths for just date and > timestamp but it'd be horrid. They aren't supposed to be exactly the same API, I said that from the start. There are API differences between continuous and discrete ranges, and we shouldn't ignore them. One important differences is that (barring overflow conditions and special values) prior, first, last, and next are defined for all discrete range values, but not for all continuous range values. For instance, the discrete range [5,7) has prior=4, first=5, last=6, next=7. Whereas the continuous range [5,7) has prior=undef, first=5, last=undef, next=7. We could define one API, that treats discrete and continuous ranges differently. But you'll never be able to transform a continuous range to a different representation, while you can do so with a discrete range. > More importantly, the notion of a representation granule is still 100% > wishful thinking for any inexact-representation datatype, which is going > to be a severe crimp in getting this accepted for timestamp, let alone > defining it in a way that would allow users to try to apply it to > floats. Float timestamps might not be the default case anymore but they > are still supported. If the only objection is that a superuser can confuse the system by poorly defining a range type on a non-default build, I think that objection can be overcome. > I think you should let go of the feeling that you have to shave bytes > off the storage format. You're creating a whole lot of work for > yourself and a whole lot of user-visible corner cases in return for > what ultimately isn't much. This isn't just to shave bytes. It's also because I like the semantics of discrete ranges for many cases. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Alvaro Herrera on 16 Dec 2009 14:45 tomas(a)tuxteam.de wrote: > (and as Andrew Dunstan pointed out off-list: I was wrong with my bold > assertion that one can squeeze infinitely many (arbitrary length) > strings between two given. This is not always the case). Of course you can do that if you assume lexicographical order, or any other arbitrary order. The interesting point is whether there exists some ordering on which this does not happen. And in fact there is: order strings by length first, and then lexicographically. If you do this then you have next() and prev() for any given string. If you use ASCII only, you have a.next = b, and so on. There is the argument that some languages do not sort lexicographically but this is also besides the point -- you only need to find *some* way to sort the characters in the alphabet. If you dictate that in your ordering "�" comes before "�" and both after "a", and all of them before b, then you know that a.next = � and �.next = � and �.next = b. (Note that I have also dictated that there is no other character that sorts after a and before b, which is perfectly possible because the alphabet is fixed for any given language. You could use the complete list of characters coded in a given set of Unicode planes, or even extant all planes, to obtain the same result). Defining strings with this ordering means you can have some useful ranges like [a-z], but then you cannot meaningfully use ranges for things like [aardvark - zulu] --- note that in this particular example, the order is reversed, because zulu comes before aardvark which is probably not what you want. zzz.next = aaaa In short, I think that while it is possible to define ranges of strings, it is not as useful as one would like. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Martijn van Oosterhout on 16 Dec 2009 15:07 On Tue, Dec 15, 2009 at 04:29:26PM -0800, Jeff Davis wrote: > On Tue, 2009-12-15 at 18:06 -0600, decibel wrote: > > Now that varlena's don't have an enormous fixed overhead, perhaps it's > > worth looking at using them. Obviously some operations would be > > slower, but for your stated examples of auditing and history, I > > suspect that you're not going to notice the overhead that much. > > For most varvarlena types, you only get stuck with the full alignment > burden if you get unlucky. In this case, we're moving from 16 bytes to > 17, which really means 24 bytes with alignment. Try creating two tables: > > create table foo(i int8, t1 timestamp, t2 timestamp); > create table bar(i int8, c "char", t1 timestamp, t2 timestamp); But a period type will take just one or two more bytes if you don't require alignment. Alignment on a varlena type seems silly anyway, since you'll be aligning the header byte rather than the content. In the implementation you may need to copy the content before processing to satisfy the alignment of the contained type, but that's just a SMOP. Have a nice day, -- Martijn van Oosterhout <kleptog(a)svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
From: Martijn van Oosterhout on 16 Dec 2009 15:24 On Wed, Dec 16, 2009 at 10:57:19AM -0800, Scott Bailey wrote: > Ok, silly question here. But how do you determine the length of a > continuous range? By definition length of [a, b) and (a, b] = b-a. But > what about (a,b) and [a,b]? Are we saying that because they are > continuous, the difference between values included in the range and > those excluded are so infinitesimally small so as not to matter? Thus > length (a,b) == length [a,b] == length [a,b)? And if that is the case, > does the inclusiveness of the range really even matter? Short answer: Yes Longer answer: You need to decide on your definition of "length" and what you usually use is the "measure". And yes, the difference between the two is so called "measure 0" and thus has no effect on the length. Note the measure has to be done considering the intervals as intervals on a real line. The integers by themselves have no measure (they are countable). So for the "length" of a set of integers you might consider the count of the set. http://planetmath.org/encyclopedia/ProofThatTheOuterLebesgueMeasureOfAnIntervalIsItsLength.html http://en.wikipedia.org/wiki/Outer_measure As for "continuous", as you use it above is not a way I recognise. There are contiguous sets, but they are something else. Have a nice day, -- Martijn van Oosterhout <kleptog(a)svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
From: Tom Lane on 16 Dec 2009 15:26
Alvaro Herrera <alvherre(a)commandprompt.com> writes: > In short, I think that while it is possible to define ranges of strings, > it is not as useful as one would like. Note it is not the *range* that is the problem, it is the assumption that there's a unique "next" string. There's no unique next in the reals or rationals either, but we have no problem considering intervals over those sets. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |