From: alvherre on 28 May 2010 13:05 Excerpts from Peter Eisentraut's message of vie may 28 12:27:52 -0400 2010: > Option 2, invent some new mechanism that accompanies a datum or a type > whereever it goes. Kind of like typmod, but not really. Then the > collation information would presumably be made available to functions > through the fmgr interface. The binary representation of data values > stays the same. Is the collation a property of the datum, or one of the comparison? If the latter, should it be really be made a sidecar of a datum, or would it make more sense to attach it to the operation being performed? I wonder if instead of trying to pass it down multiple layers till bttextcmp and further down, it would make more sense to set a global variable somewhere in the high levels, and only have it checked in varstr_cmp. -- Álvaro Herrera <alvherre(a)commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Heikki Linnakangas on 28 May 2010 13:22 On 28/05/10 19:27, Peter Eisentraut wrote: > I have been thinking about this collation support business a bit. > Ignoring for the moment where we would get the actual collation routines > from, I wonder how we are going to pass this information around in the > system. Someone declares a collation on a column in a table, and > somehow this information needs to arrive in bttextcmp() and friends. Yes. Comparison operators need it, as do functions like isalpha(). > Also, functions that take in a string and return one (e.g., substring), > need to take in this information and return it back out. How should > this work? Hmm, I don't see what substring would need collation for. And it certainly shouldn't be returning it. Collation is a property of the comparison operators (and isalpha etc.), and the planner needs to deduce the right collation for each such operation in the query. That involves looking at the tables and columns involved, as well as per-user information and any explicit COLLATE clauses in the query, but all that happens at plan-time. > Option 1, make it part of the datum. That way it will pass through the > system just fine, but it would waste a lot of storage and break just > about everything that operates on string types now, as well as > pg_upgrade. So that's probably out. It's also fundamentally wrong, collation is not a property of a datum but of the operation. > Option 2, invent some new mechanism that accompanies a datum or a type > whereever it goes. Kind of like typmod, but not really. Then the > collation information would presumably be made available to functions > through the fmgr interface. The binary representation of data values > stays the same. Something like that. I'm thinking that bttextcmp() and friends will simply take an extra argument indicating the collation, and we'll teach the operator / operator class infrastructure about that too. One way to approach this is to realize that it's already possible to use multiple collations in a database. You just have to define separate < = > operators and operator classes for every collation, and change all your queries to use the right operator depending on the desired collation everywhere where you use < = > (including ORDER BYs, with the USING <operator> syntax). The behavior is exactly what we want, it's just completely inpractical, so we need something to do the same in a less cumbersome way. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Peter Eisentraut on 28 May 2010 13:59 On fre, 2010-05-28 at 20:22 +0300, Heikki Linnakangas wrote: > It's also fundamentally wrong, collation is not a property of a datum > but of the operation. > One way to approach this is to realize that it's already possible to > use > multiple collations in a database. You just have to define separate < > = > > operators and operator classes for every collation, and change all > your queries to use the right operator depending on the desired > collation everywhere where you use < = > (including ORDER BYs, with > the > USING <operator> syntax). The behavior is exactly what we want, it's > just completely inpractical, so we need something to do the same in a > less cumbersome way. Well, maybe we should step back a little and work out what sort of feature we actually want, if any. The feature I'm thinking of is what people might call "per-column locale", and the SQL standard defines that. It would look like this: CREATE TABLE test ( a varchar COLLATE de, b varchar COLLATE fr ); SELECT * FROM test WHERE a > 'baz' ORDER BY b; So while it's true that the collation is used by the operations (> and ORDER BY), the information which collation to use comes with the data values. It's basically saying, a is in language "de", so sort it like that unless told otherwise. There is also an override syntax available, like this: SELECT * FROM test WHERE a COLLATE en > 'baz' ORDER BY b COLLATE sv; But here again the collation is attached to a data value, and only from there it is passed to the operator. What is actually happening is SELECT * FROM test WHERE (a COLLATE en) > 'baz' ORDER BY (b COLLATE sv); What you appear to be describing is a "per-operation locale", which also sounds valid, but it would be a different thing. It might be thought of as this: SELECT * FROM test WHERE a (> COLLATE en) 'baz' ORDER BY COLLATE sv b; with some suitable global default. So which one of these should it be? -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 28 May 2010 14:48 Peter Eisentraut <peter_e(a)gmx.net> writes: > So while it's true that the collation is used by the operations (> and > ORDER BY), the information which collation to use comes with the data > values. It's basically saying, a is in language "de", so sort it like > that unless told otherwise. There is also an override syntax available, > like this: > SELECT * FROM test WHERE a COLLATE en > 'baz' ORDER BY b COLLATE sv; That seems fairly bizarre. What does this mean: WHERE a COLLATE en > b COLLATE de ? If it's an error, why is this not an error WHERE a COLLATE en > b if b is marked as COLLATE de in its table? I guess the more general question is whether the spec expects that collation settings can be derived statically (like type information) or whether they might sometimes only be known at runtime. We also need to think about whether we're okay with only applying collation to built-in types (text, varchar, char) or whether we need the feature to work for add-on types as well. In particular, is citext still a meaningful feature if we have this, or is it superseded by COLLATE? In the abstract I'd prefer to let it work for user-defined types, but if we can have a much simpler implementation by not doing so, it might be better to give that up. Is COLLATE a property that can be attached to a domain over text? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 28 May 2010 15:03 On Fri, May 28, 2010 at 2:48 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > Peter Eisentraut <peter_e(a)gmx.net> writes: >> So while it's true that the collation is used by the operations (> and >> ORDER BY), the information which collation to use comes with the data >> values. It's basically saying, a is in language "de", so sort it like >> that unless told otherwise. There is also an override syntax available, >> like this: > >> SELECT * FROM test WHERE a COLLATE en > 'baz' ORDER BY b COLLATE sv; > > That seems fairly bizarre. What does this mean: > > WHERE a COLLATE en > b COLLATE de > > ? If it's an error, why is this not an error > > WHERE a COLLATE en > b > > if b is marked as COLLATE de in its table? I think we need to think of the comparison operators as ternary, and the COLLATE syntax applied to columns or present in queries as various ways of setting defaults or explicit overrides for what the third argument will end up being. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
|
Next
|
Last
Pages: 1 2 3 4 Prev: [HACKERS] How to pass around collation information Next: [BUGS] dividing money by money |