How to pass around collation information [PgSql]

Prev: [HACKERS] How to pass around collation information
Next: [BUGS] dividing money by money

From: Heikki Linnakangas on 28 May 2010 17:18

On 28/05/10 23:15, Robert Haas wrote:
> On Fri, May 28, 2010 at 3:20 PM, Peter Eisentraut<peter_e(a)gmx.net> wrote:
>> On fre, 2010-05-28 at 15:03 -0400, Robert Haas wrote:
>>> I think we need to think of the comparison operators as ternary, and
>>> the COLLATE syntax applied to columns or present in queries as various
>>> ways of setting defaults or explicit overrides for what the third
>>> argument will end up being.
>>
>> How could this extend to things like isalpha() or upper() that would
>> need access to ctype information?
>
> Good question. :-(

Strictly speaking, collation and ctype are two different things. Which
is a convenient way to evade the question :-).

But you could ask, how would we handle more fine-grained ctype in
upper() then? Perhaps by adding a second argument for ctype. Similarly
to to_tsvector([config, ] string), you could explicitly pass the ctype
as an argument, or leave it out in which case a default is used. It
wouldn't give you per-column ctype, though.

What does the spec have to say about the ctype used for upper() et al BTW?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Martijn van Oosterhout on 28 May 2010 17:23

On Fri, May 28, 2010 at 10:32:34PM +0300, Peter Eisentraut wrote:
> On fre, 2010-05-28 at 14:48 -0400, Tom Lane wrote:
> > > SELECT * FROM test WHERE a COLLATE en > 'baz' ORDER BY b COLLATE sv;
> >
> > That seems fairly bizarre. What does this mean:
> >
> > WHERE a COLLATE en > b COLLATE de
> >
> > ? If it's an error, why is this not an error
> >
> > WHERE a COLLATE en > b
> >
> > if b is marked as COLLATE de in its table?
>
> The way I understand it, a collation "derivation" can be explicit or
> implicit. Explicit derivations override implicit derivations. If in
> the argument set of an operation, explicit collation derivations exist,
> they must all be the same.

The SQL standard has an explicit set of rules for determining the
collations of any particular operation (they apply to
operators/functions not to the datums).

The basic idea is that tables/columns/data types define an implicit
collation, which can be overidden by explicit collations. If there is
ambiguity you throw an error. I implemented this all several years ago,
it's not all that complicated really. IIRC I added a field to the Node type
and each level determined it's collection from the sublevels.

I solved the problem for the OP by providing an extra function to user
defined functions which would return the collation for that particular
call.

The more interesting question I found was that the standard only
defined collation for strings, whereas it can be applied much more
broadly. I described a possible solution several years back, it should
in the archives somewhere. It worked pretty well as I recall.

IIRC The idea was to let each type has its own set of collations and
when using an operator/function you let the collection be determined
by the argument that had the same type as the return type.

It would be nice if COLLATE could finally be implemented, it'd be quite
useful.

Have a nice day,
--
Martijn van Oosterhout <kleptog(a)svana.org> http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
> - Charles de Gaulle

From: Peter Eisentraut on 1 Jun 2010 16:50

On lör, 2010-05-29 at 00:18 +0300, Heikki Linnakangas wrote:
> What does the spec have to say about the ctype used for upper() et al
> BTW?

It doesn't make any provisions for locale dependencies for that.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Peter Eisentraut on 2 Jun 2010 15:46

On fre, 2010-05-28 at 20:59 +0300, Peter Eisentraut wrote:
> The feature I'm thinking of is what
> people might call "per-column locale", and the SQL standard defines
> that. It would look like this:
>
> CREATE TABLE test (
> a varchar COLLATE de,
> b varchar COLLATE fr
> );
>
> SELECT * FROM test WHERE a > 'baz' ORDER BY b;

Perhaps it's also worth pointing out there could be use cases other than
supporting multiple natural languages. For example, it is frequently
requested to be able to sort in ways that doesn't ignore special
characters, binary sort, or perhaps special file name sort that treats
'/' special in some way. So it could be quite useful to be able to say

CREATE TABLE something (
description text COLLATE en,
code char(6) COLLATE binary,
file text COLLATE filename_sort
);

or even something like

CREATE DOMAIN filename AS text COLLATE filename_sort;

These are examples where having the collation attached to the column
would appear to make more sense then having it attached only to
operations.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 2 Jun 2010 16:56

On Wed, Jun 2, 2010 at 3:46 PM, Peter Eisentraut <peter_e(a)gmx.net> wrote:
> On fre, 2010-05-28 at 20:59 +0300, Peter Eisentraut wrote:
>> The feature I'm thinking of is what
>> people might call "per-column locale", and the SQL standard defines
>> that. It would look like this:
>>
>> CREATE TABLE test (
>> a varchar COLLATE de,
>> b varchar COLLATE fr
>> );
>>
>> SELECT * FROM test WHERE a > 'baz' ORDER BY b;
>
> Perhaps it's also worth pointing out there could be use cases other than
> supporting multiple natural languages. For example, it is frequently
> requested to be able to sort in ways that doesn't ignore special
> characters, binary sort, or perhaps special file name sort that treats
> '/' special in some way. So it could be quite useful to be able to say
>
> CREATE TABLE something (
> description text COLLATE en,
> code char(6) COLLATE binary,
> file text COLLATE filename_sort
> );
>
> or even something like
>
> CREATE DOMAIN filename AS text COLLATE filename_sort;
>
> These are examples where having the collation attached to the column
> would appear to make more sense then having it attached only to
> operations.

But in the end the only purpose of setting it on a column is to set
which one will be used for operations on that column. And the user
might still override it for a particular query.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: [HACKERS] How to pass around collation information
Next: [BUGS] dividing money by money