Prev: functional call named notation clashes with SQLfeature
Next: [HACKERS] Straightforward Synchronous Replication
From: Greg Stark on 29 May 2010 10:09 On Sat, May 29, 2010 at 9:13 AM, Tatsuo Ishii <ishii(a)postgresql.org> wrote: > ! #define iswordchr(c) (lc_ctype_is_c()? \ > ! ((*(c) & 0x80)? !t_isspace(c) : (t_isalpha(c) || t_isdigit(c))) : \ > Surely isspace(c) will always be false for non-ascii characters in C locale? Now it might be sensible to just treat any non-ascii character as a word-character in addition to alpha and digits, so what might make sense is t_isalpha(c) || t_isdigit(c)) || (lc_ctype_is_c() && *(c)&0x80) Though I wonder whether it wouldn't be generally more useful to users to provide the non-space version as an option. I could see that being useful for people in other circumstances aside from working around this locale problem. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tatsuo Ishii on 29 May 2010 22:05 > > Wait. This works fine for me with stock pg_trgm. local is C and > > encoding is UTF8. What version of PostgreSQL are you using? Mine is > > 8.4.4. > > This is in 9.0, because 8.4 doesn't recognize the \u escape syntax. If > you run this in 8.4, you're just comparing a sequence of ASCII letters > and digits. Hum. Still I prefer 8.4's behavior since anything is better than returning NaN. It seems 9.0 does not have any escape route for multibyte+C locale users. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tatsuo Ishii on 29 May 2010 21:53 > This is still ignoring the point: arbitrarily changing the module's > longstanding standard behavior isn't acceptable. You need to provide > a way for the user to control the behavior. (Once you've done that, > I think it can be just either "alnum" or "!isspace", but maybe some > other behaviors would be interesting.) To be honest I don't know what "module's longstanding standard behavior" should be. It's not documented anywhere. If you mean that is whatever the current implementation is, then any effort to touch the module should be prohibited. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Peter Eisentraut on 30 May 2010 07:36 On sön, 2010-05-30 at 11:05 +0900, Tatsuo Ishii wrote: > > > Wait. This works fine for me with stock pg_trgm. local is C and > > > encoding is UTF8. What version of PostgreSQL are you using? Mine is > > > 8.4.4. > > > > This is in 9.0, because 8.4 doesn't recognize the \u escape syntax. If > > you run this in 8.4, you're just comparing a sequence of ASCII letters > > and digits. > > Hum. Still I prefer 8.4's behavior since anything is better than > returning NaN. It seems 9.0 does not have any escape route for > multibyte+C locale users. I think you are confusing some things here. The \u escape syntax is for string literals in general. The behavior of pg_trgm is still the same in 8.4 and in 9.0. It's just easier in 9.0 to write out examples relevant to the current problem. -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 30 May 2010 10:41
Tatsuo Ishii <ishii(a)postgresql.org> writes: >> This is still ignoring the point: arbitrarily changing the module's >> longstanding standard behavior isn't acceptable. You need to provide >> a way for the user to control the behavior. (Once you've done that, >> I think it can be just either "alnum" or "!isspace", but maybe some >> other behaviors would be interesting.) > To be honest I don't know what "module's longstanding standard > behavior" should be. It's not documented anywhere. Well, that's a documentation problem rather than an argument for changing the code. > If you mean that is > whatever the current implementation is, then any effort to touch the > module should be prohibited. I don't think it's unreasonable to insist that behavioral changes be made in an upward compatible fashion ... especially ones that seem as least as likely to break some current usages as to enable new usages. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |