Per-column collation, proof of concept [PgSql]

Prev: Synchronous replication
Next: Patch for 9.1: initdb -C option

From: "Kevin Grittner" on 14 Jul 2010 13:21

Peter Eisentraut <peter_e(a)gmx.net> wrote:

> Here is a proof of concept for per-column collation support.

Did you want a WIP review of that patch? (CF closing to new
submissions soon....)

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Pavel Stehule on 14 Jul 2010 13:35

Hello

I have only one question - If I understand well you can use collate
just for sort. What is your plan for range search operation? Sort is
interesting and I am sure important for multilangual applications, for
me - more important is case sensitive, case insensitive, accent
sensitive, insensitive filtering - do you have a plan for it?

Regards

Pavel Stehule

2010/7/13 Peter Eisentraut <peter_e(a)gmx.net>:
> Here is a proof of concept for per-column collation support.
>
> Here is how it works: When creating a table, an optional COLLATE clause
> can specify a collation name, which is stored (by OID) in pg_attribute.
> This becomes part of the type information and is propagated through the
> expression parse analysis, like typmod. When an operator or function
> call is parsed (transformed), the collations of the arguments are
> unified, using some rules (like type analysis, but different in detail).
> The collations of the function/operator arguments come either from Var
> nodes which in turn got them from pg_attribute, or from other
> function and operator calls, or you can override them with explicit
> COLLATE clauses (not yet implemented, but will work a bit like
> RelabelType). At the end, each function or operator call gets one
> collation to use.
>

what about DISTINCT clause, maybe GROUP BY clause ?

regards

Pavel

> The function call itself can then look up the collation using the
> fcinfo->flinfo->fn_expr field. (Works for operator calls, but doesn't
> work for sort operations, needs more thought.)
>
> A collation is in this implementation defined as an lc_collate string
> and an lc_ctype string. The implementation of functions interested in
> that information, such as comparison operators, or upper and lower
> functions, will take the collation OID that is passed in, look up the
> locale string, and use the xlocale.h interface (newlocale(),
> strcoll_l()) to compute the result.
>
> (Note that the xlocale stuff is only 10 or so lines in this patch. It
> should be feasible to allow other appropriate locale libraries to be
> used.)
>
> Loose ends:
>
> - Support function calls (currently only operator calls) (easy)
>
> - Implementation of sort clauses
>
> - Indexing support/integration
>
> - Domain support (should be straightforward)
>
> - Make all expression node types deal with collation information
> appropriately
>
> - Explicit COLLATE clause on expressions
>
> - Caching and not leaking memory of locale lookups
>
> - I have typcollatable to mark which types can accept collation
> information, but perhaps there should also be proicareaboutcollation
> to skip collation resolution when none of the functions in the
> expression tree care.
>
> You can start by reading the collate.sql regression test file to see
> what it can do. Btw., regression tests only work with "make check
> MULTIBYTE=UTF8". And it (probably) only works with glibc for now.
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Peter Eisentraut on 14 Jul 2010 17:11

On ons, 2010-07-14 at 19:35 +0200, Pavel Stehule wrote:
> I have only one question - If I understand well you can use collate
> just for sort. What is your plan for range search operation?

My patch does range searches. Sorting uses the same operators, so both
will be supported. (Sorting is not yet implemented, as I had written.)

> Sort is
> interesting and I am sure important for multilangual applications, for
> me - more important is case sensitive, case insensitive, accent
> sensitive, insensitive filtering - do you have a plan for it?

You may be able to do some of these by using appropriate locale
definitions. I'd need some examples to be able to tell for sure.

> what about DISTINCT clause, maybe GROUP BY clause ?

DISTINCT and GROUP BY work with equality, which is not affected by
locales (at least under the current rules).

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Pavel Stehule on 14 Jul 2010 23:57

2010/7/14 Peter Eisentraut <peter_e(a)gmx.net>:
> On ons, 2010-07-14 at 19:35 +0200, Pavel Stehule wrote:
>> I have only one question - If I understand well you can use collate
>> just for sort. What is your plan for range search operation?
>
> My patch does range searches. Sorting uses the same operators, so both
> will be supported. (Sorting is not yet implemented, as I had written.)
>
>> Sort is
>> interesting and I am sure important for multilangual applications, for
>> me - more important is case sensitive, case insensitive, accent
>> sensitive, insensitive filtering - do you have a plan for it?
>
> You may be able to do some of these by using appropriate locale
> definitions. I'd need some examples to be able to tell for sure.
>
>> what about DISTINCT clause, maybe GROUP BY clause ?
>
> DISTINCT and GROUP BY work with equality, which is not affected by
> locales (at least under the current rules).
>

:( maybe we have to enhance a locales - or do some work in this way.
In Czech's IS is relative often operation some like

name = 'Stěhule' COLLATION cs_CZ_cs_ai -- compare case insensitive
accent insensitive

PostgreSQL is last db, that doesn't integreated support for it

Regards

Pavel

>
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Peter Eisentraut on 15 Jul 2010 04:44

On tor, 2010-07-15 at 05:57 +0200, Pavel Stehule wrote:
> :( maybe we have to enhance a locales - or do some work in this way.
> In Czech's IS is relative often operation some like
>
> name = 'Stěhule' COLLATION cs_CZ_cs_ai -- compare case insensitive
> accent insensitive
>
> PostgreSQL is last db, that doesn't integreated support for it

Well, the comparison function varstr_cmp() contains this comment:

/*
* In some locales strcoll() can claim that nonidentical strings are
* equal. Believing that would be bad news for a number of reasons,
* so we follow Perl's lead and sort "equal" strings according to
* strcmp().
*/

This might not be strictly necessary, seeing that citext obviously
doesn't work that way, but resolving this is really an orthogonal issue.
If you fix that and you have a locale that does what you want, my patch
will help you get your example working.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

| Next | Last
Pages: 1 2
Prev: Synchronous replication
Next: Patch for 9.1: initdb -C option