Prev: functional call named notation clashes with SQLfeature
Next: [HACKERS] Straightforward Synchronous Replication
From: Tom Lane on 27 May 2010 09:55 Tatsuo Ishii <ishii(a)postgresql.org> writes: > What is your locale? >> It was en_EN.UTF-8. Interesting. With C it fails... > Yes, pg_trgm seems to have problems with multibyte + C locale. It's not a problem, it's just pilot error, or possibly inadequate documentation. pg_trgm uses the locale's definition of "alpha", "digit", etc. In C locale only basic ASCII letters and digits will be recognized as word constituents. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tatsuo Ishii on 27 May 2010 10:05 > > Yes, pg_trgm seems to have problems with multibyte + C locale. > > It's not a problem, it's just pilot error, or possibly inadequate > documentation. pg_trgm uses the locale's definition of "alpha", > "digit", etc. In C locale only basic ASCII letters and digits will be > recognized as word constituents. That means there is no chance to make pg_trgm work with multibyte + C locale? If so, I will leave pg_trgm as it is and provide private patches for those who need the functionality. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 27 May 2010 10:15 Tatsuo Ishii <ishii(a)postgresql.org> writes: >> It's not a problem, it's just pilot error, or possibly inadequate >> documentation. pg_trgm uses the locale's definition of "alpha", >> "digit", etc. In C locale only basic ASCII letters and digits will be >> recognized as word constituents. > That means there is no chance to make pg_trgm work with multibyte + C > locale? If so, I will leave pg_trgm as it is and provide private > patches for those who need the functionality. Exactly what do you consider to be the missing functionality? You need a notion of word vs non-word character from somewhere, and the locale setting is the standard place to get that. The core text search functionality behaves the same way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tatsuo Ishii on 27 May 2010 10:20 > Exactly what do you consider to be the missing functionality? > You need a notion of word vs non-word character from somewhere, > and the locale setting is the standard place to get that. The > core text search functionality behaves the same way. No. Text search works fine with multibyte + C locale. Anyway locale is completely usesless for finding word vs non-character an agglutinative language such as Japanese. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 27 May 2010 10:24
Tatsuo Ishii <ishii(a)sraoss.co.jp> writes: > Anyway locale is completely usesless for finding word vs non-character > an agglutinative language such as Japanese. Well, that doesn't mean that the answer is to use C locale ;-) However, you could possibly think about making this bit of code more flexible: #ifdef KEEPONLYALNUM #define iswordchr(c) (t_isalpha(c) || t_isdigit(c)) #else #define iswordchr(c) (!t_isspace(c)) #endif Currently it seems to be hard-wired to the first case in standard builds. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |