From: Tom Lane on 15 Jul 2010 11:24 Peter Eisentraut <peter_e(a)gmx.net> writes: > Well, the comparison function varstr_cmp() contains this comment: > /* > * In some locales strcoll() can claim that nonidentical strings are > * equal. Believing that would be bad news for a number of reasons, > * so we follow Perl's lead and sort "equal" strings according to > * strcmp(). > */ > This might not be strictly necessary, seeing that citext obviously > doesn't work that way, but resolving this is really an orthogonal issue. The problem with not doing that is it breaks hashing --- hash joins and hash aggregation being the real pain points. citext works around this in a rather klugy fashion by decreeing that two strings are equal iff their str_tolower() conversions are bitwise equal. So it can hash the str_tolower() representation. But that's kinda slow and it fails in the general case anyhow, I think. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Stark on 15 Jul 2010 13:04 On Thu, Jul 15, 2010 at 4:24 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > The problem with not doing that is it breaks hashing --- hash joins and > hash aggregation being the real pain points. > > citext works around this in a rather klugy fashion by decreeing that two > strings are equal iff their str_tolower() conversions are bitwise equal. > So it can hash the str_tolower() representation. �But that's kinda slow > and it fails in the general case anyhow, I think. I think the general equivalent would be to call strxfrm and hash the result of that. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Jaime Casanova on 2 Aug 2010 02:43 On Tue, Jul 13, 2010 at 1:25 PM, Peter Eisentraut <peter_e(a)gmx.net> wrote: > Here is a proof of concept for per-column collation support. > Hi, i was looking at this. nowadays, CREATE DATABASE has a lc_collate clause. is the new collate clause similar as the lc_collate? i mean, is lc_collate what we will use as a default? if yes, then probably we need to use pg_collation there too because lc_collate and the new collate clause use different collation names. """ postgres=# create database test with lc_collate 'en_US.UTF-8'; CREATE DATABASE test=# create table t1 (col1 text collate "en_US.UTF-8"); ERROR: collation "en_US.UTF-8" does not exist test=# create table t1 (col1 text collate "en_US.utf8"); CREATE TABLE """ also i got errors from regression tests when MULTIBYTE=UTF8 (attached). it seems i was trying to create locales that weren't defined on locales.txt (from were was fed that file?). i added a line to that file (for es_EC.utf8) then i create a table with a column using that collate and execute "select * from t2 where col1 > 'n'; " and i got this error: "ERROR: could not create locale "es_EC.utf8"" (of course, that last part was me messing the things up, but it show we shouldn't be using a file locales.txt, i think) i can attach a collate to a domain but i can't see where are we storing that info (actually it says it's not collatable): -- Jaime Casanova     www.2ndQuadrant.com Soporte y capacitación de PostgreSQL
From: Peter Eisentraut on 3 Aug 2010 12:32
On mån, 2010-08-02 at 01:43 -0500, Jaime Casanova wrote: > nowadays, CREATE DATABASE has a lc_collate clause. is the new collate > clause similar as the lc_collate? > i mean, is lc_collate what we will use as a default? Yes, if you do not specify anything per column, the database default is used. How to integrate the per-database or per-cluster configuration with the new system is something to figure out in the future. > if yes, then probably we need to use pg_collation there too because > lc_collate and the new collate clause use different collation names. > """ > postgres=# create database test with lc_collate 'en_US.UTF-8'; > CREATE DATABASE > test=# create table t1 (col1 text collate "en_US.UTF-8"); > ERROR: collation "en_US.UTF-8" does not exist > test=# create table t1 (col1 text collate "en_US.utf8"); > CREATE TABLE > """ This is something that libc does for you. The locale as listed by locale -a is called "en_US.utf8", but apparently libc takes "en_US.UTF-8" as well. > also i got errors from regression tests when MULTIBYTE=UTF8 > (attached). it seems i was trying to create locales that weren't > defined on locales.txt (from were was fed that file?). i added a line > to that file (for es_EC.utf8) then i create a table with a column > using that collate and execute "select * from t2 where col1 > 'n'; " > and i got this error: "ERROR: could not create locale "es_EC.utf8"" > (of course, that last part was me messing the things up, but it show > we shouldn't be using a file locales.txt, i think) It might be that you don't have those locales installed in your system. locales.txt is created by using locale -a. Check what that gives you. > i can attach a collate to a domain but i can't see where are we > storing that info (actually it says it's not collatable): Domain support is not done yet. -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |