How to pass around collation information [PgSql]

Prev: [HACKERS] How to pass around collation information
Next: [BUGS] dividing money by money

From: Peter Eisentraut on 28 May 2010 15:20

On fre, 2010-05-28 at 15:03 -0400, Robert Haas wrote:
> I think we need to think of the comparison operators as ternary, and
> the COLLATE syntax applied to columns or present in queries as various
> ways of setting defaults or explicit overrides for what the third
> argument will end up being.

How could this extend to things like isalpha() or upper() that would
need access to ctype information?

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Peter Eisentraut on 28 May 2010 15:32

On fre, 2010-05-28 at 14:48 -0400, Tom Lane wrote:
> > SELECT * FROM test WHERE a COLLATE en > 'baz' ORDER BY b COLLATE sv;
>
> That seems fairly bizarre. What does this mean:
>
> WHERE a COLLATE en > b COLLATE de
>
> ? If it's an error, why is this not an error
>
> WHERE a COLLATE en > b
>
> if b is marked as COLLATE de in its table?

The way I understand it, a collation "derivation" can be explicit or
implicit. Explicit derivations override implicit derivations. If in
the argument set of an operation, explicit collation derivations exist,
they must all be the same.

> I guess the more general question is whether the spec expects that
> collation settings can be derived statically (like type information)
> or whether they might sometimes only be known at runtime.

It looks like it is treated like type information. The derivation and
validation rules are part of the Syntax Rules.

> We also need to think about whether we're okay with only applying
> collation to built-in types (text, varchar, char) or whether we need
> the feature to work for add-on types as well. In particular, is citext
> still a meaningful feature if we have this, or is it superseded by
> COLLATE? In the abstract I'd prefer to let it work for user-defined
> types, but if we can have a much simpler implementation by not doing
> so, it might be better to give that up.

I think if we get this done using the strcoll_l() API to do the work,
which looks like the path of least resistance at the moment, citext
would still be useful because all the standard locales would still be
case sensitive.

> Is COLLATE a property that can be attached to a domain over text?

According to the spec, yes.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Dimitri Fontaine on 28 May 2010 15:53

Hi,

Peter Eisentraut <peter_e(a)gmx.net> writes:
> On fre, 2010-05-28 at 20:22 +0300, Heikki Linnakangas wrote:
>> USING <operator> syntax). The behavior is exactly what we want, it's
>> just completely inpractical, so we need something to do the same in a
>> less cumbersome way.

For an example, here is something I did to better understand the system
a while ago. Of course I never got to use it for real:

http://pgsql.tapoueh.org/btree_fr_ops/

> Well, maybe we should step back a little and work out what sort of
> feature we actually want, if any. The feature I'm thinking of is what
> people might call "per-column locale", and the SQL standard defines
> that. It would look like this:
>
> CREATE TABLE test (
> a varchar COLLATE de,
> b varchar COLLATE fr
> );
>
> SELECT * FROM test WHERE a > 'baz' ORDER BY b;
>
> So while it's true that the collation is used by the operations (> and
> ORDER BY), the information which collation to use comes with the data
> values. It's basically saying, a is in language "de", so sort it like
> that unless told otherwise. There is also an override syntax available,
> like this:
>
> SELECT * FROM test WHERE a COLLATE en > 'baz' ORDER BY b COLLATE sv;
>
> But here again the collation is attached to a data value, and only from
> there it is passed to the operator. What is actually happening is
>
> SELECT * FROM test WHERE (a COLLATE en) > 'baz' ORDER BY (b COLLATE sv);
>
>
> What you appear to be describing is a "per-operation locale", which also
> sounds valid, but it would be a different thing. It might be thought of
> as this:
>
> SELECT * FROM test WHERE a (> COLLATE en) 'baz' ORDER BY COLLATE sv b;
>
> with some suitable global default.
>
>
> So which one of these should it be?

My understanding is that what we do is per-operation locale. The locale
information bears no semantic when not attached to some operation on
strings, like sorting or comparing.

So what you're showing here I think is how to attach a collation label
to every string in the system, at the catalog level or dynamically at
the query level.

Now this collation label will only be used whenever you want to use a
collation aware function or operator. Those functions need to get the
labels for their implementation to have the expected meaning.

So we need both to attach collations to all known strings (defaulting to
the current database collation I guess), as you showed at the SQL level,
and to pass this information down to the functions operating on those
strings.

A confusing example on this grounds would be the following, which I hope
the standard disallow:

SELECT * FROM test WHERE a COLLATE en > b COLLATE sv;

Regards,
--
dim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 28 May 2010 16:15

On Fri, May 28, 2010 at 3:20 PM, Peter Eisentraut <peter_e(a)gmx.net> wrote:
> On fre, 2010-05-28 at 15:03 -0400, Robert Haas wrote:
>> I think we need to think of the comparison operators as ternary, and
>> the COLLATE syntax applied to columns or present in queries as various
>> ways of setting defaults or explicit overrides for what the third
>> argument will end up being.
>
> How could this extend to things like isalpha() or upper() that would
> need access to ctype information?

Good question. :-(

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Pavel Stehule on 28 May 2010 16:23

2010/5/28 alvherre <alvherre(a)commandprompt.com>:
> Excerpts from Peter Eisentraut's message of vie may 28 12:27:52 -0400 2010:
>
>> Option 2, invent some new mechanism that accompanies a datum or a type
>> whereever it goes. Â Kind of like typmod, but not really. Â Then the
>> collation information would presumably be made available to functions
>> through the fmgr interface. Â The binary representation of data values
>> stays the same.
>
> Is the collation a property of the datum, or one of the comparison?
> If the latter, should it be really be made a sidecar of a datum, or
> would it make more sense to attach it to the operation being performed?

>
> I wonder if instead of trying to pass it down multiple layers till
> bttextcmp and further down, it would make more sense to set a global
> variable somewhere in the high levels, and only have it checked in
> varstr_cmp.
>

Maybe collation is property of some operation: func call, sort, ... I
prefer to put collation info to FunctionCallInfo structure. Usually
you cannot change collation per row - collation is attached to
expression.

Regards

Pavel
> --
> Ãlvaro Herrera <alvherre(a)commandprompt.com>
> The PostgreSQL Company - Command Prompt, Inc.
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: [HACKERS] How to pass around collation information
Next: [BUGS] dividing money by money