From: Greg Stark on
On Sat, May 29, 2010 at 9:13 AM, Tatsuo Ishii <ishii(a)postgresql.org> wrote:
> ! #define iswordchr(c)  (lc_ctype_is_c()? \
> !                                                               ((*(c) & 0x80)? !t_isspace(c) : (t_isalpha(c) || t_isdigit(c))) : \
>

Surely isspace(c) will always be false for non-ascii characters in C locale?

Now it might be sensible to just treat any non-ascii character as a
word-character in addition to alpha and digits, so what might make
sense is

t_isalpha(c) || t_isdigit(c)) || (lc_ctype_is_c() && *(c)&0x80)

Though I wonder whether it wouldn't be generally more useful to users
to provide the non-space version as an option. I could see that being
useful for people in other circumstances aside from working around
this locale problem.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tatsuo Ishii on
> > Wait. This works fine for me with stock pg_trgm. local is C and
> > encoding is UTF8. What version of PostgreSQL are you using? Mine is
> > 8.4.4.
>
> This is in 9.0, because 8.4 doesn't recognize the \u escape syntax. If
> you run this in 8.4, you're just comparing a sequence of ASCII letters
> and digits.

Hum. Still I prefer 8.4's behavior since anything is better than
returning NaN. It seems 9.0 does not have any escape route for
multibyte+C locale users.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tatsuo Ishii on
> This is still ignoring the point: arbitrarily changing the module's
> longstanding standard behavior isn't acceptable. You need to provide
> a way for the user to control the behavior. (Once you've done that,
> I think it can be just either "alnum" or "!isspace", but maybe some
> other behaviors would be interesting.)

To be honest I don't know what "module's longstanding standard
behavior" should be. It's not documented anywhere. If you mean that is
whatever the current implementation is, then any effort to touch the
module should be prohibited.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Peter Eisentraut on
On sön, 2010-05-30 at 11:05 +0900, Tatsuo Ishii wrote:
> > > Wait. This works fine for me with stock pg_trgm. local is C and
> > > encoding is UTF8. What version of PostgreSQL are you using? Mine is
> > > 8.4.4.
> >
> > This is in 9.0, because 8.4 doesn't recognize the \u escape syntax. If
> > you run this in 8.4, you're just comparing a sequence of ASCII letters
> > and digits.
>
> Hum. Still I prefer 8.4's behavior since anything is better than
> returning NaN. It seems 9.0 does not have any escape route for
> multibyte+C locale users.

I think you are confusing some things here. The \u escape syntax is for
string literals in general. The behavior of pg_trgm is still the same
in 8.4 and in 9.0. It's just easier in 9.0 to write out examples
relevant to the current problem.


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Tatsuo Ishii <ishii(a)postgresql.org> writes:
>> This is still ignoring the point: arbitrarily changing the module's
>> longstanding standard behavior isn't acceptable. You need to provide
>> a way for the user to control the behavior. (Once you've done that,
>> I think it can be just either "alnum" or "!isspace", but maybe some
>> other behaviors would be interesting.)

> To be honest I don't know what "module's longstanding standard
> behavior" should be. It's not documented anywhere.

Well, that's a documentation problem rather than an argument for
changing the code.

> If you mean that is
> whatever the current implementation is, then any effort to touch the
> module should be prohibited.

I don't think it's unreasonable to insist that behavioral changes be
made in an upward compatible fashion ... especially ones that seem as
least as likely to break some current usages as to enable new usages.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers