Prev: functional call named notation clashes with SQLfeature
Next: [HACKERS] Straightforward Synchronous Replication
From: Tatsuo Ishii on 30 May 2010 10:52 > > > This is in 9.0, because 8.4 doesn't recognize the \u escape syntax. If > > > you run this in 8.4, you're just comparing a sequence of ASCII letters > > > and digits. > > > > Hum. Still I prefer 8.4's behavior since anything is better than > > returning NaN. It seems 9.0 does not have any escape route for > > multibyte+C locale users. > > I think you are confusing some things here. The \u escape syntax is for > string literals in general. The behavior of pg_trgm is still the same > in 8.4 and in 9.0. It's just easier in 9.0 to write out examples > relevant to the current problem. I just wanted to point out from the point of view of users. I do not object the new \u escape syntax. I think pg_trgm has a problem. But Tom thinks that it's not a problem. That's the point. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Stark on 30 May 2010 12:59 On Sun, May 30, 2010 at 3:41 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > I don't think it's unreasonable to insist that behavioral changes be > made in an upward compatible fashion ... especially ones that seem as > least as likely to break some current usages as to enable new usages. Fwiw I don't think we've traditionally been so tense about contrib modules. With the advent of extensions that users can easily install with a single command that might be about to change though. There seem to be three behaviours on the table here: 1) Status quo -- only alpha and digit characters for the current locale are considered word elements 2) All characters aside from space characters for the current locale are considered word elements 3) Alpha and digit characters for the current locale, and for C locale any non-ascii (high bit set) character is considered a word element 1 -> 3 seems like a pretty safe change considering that anyone using non-ascii characters in C locale probably isn't using pg_tgrm or they would be complaining about it already. How big a user-base do we think pg_tgrm has anyways? How many of those are using it on non-ascii characters in C locale? And of those how many expect the non-ascii characters to be considered non-word characters? It doesn't sound like terribly useful behaviour to me. Behaviour 2 also seems like it would be useful so providing it as well is also a perfectly reasonable option. But I agree that 1->2 would be a user-visible change for basically all users so it would have to be an additional option. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 30 May 2010 13:51 Greg Stark <gsstark(a)mit.edu> writes: > There seem to be three behaviours on the table here: You're neglecting 4) Let the user decide whether he wants pg_trgm to consider word elements to be "alphanumerics" or "any non-space". The main problem I have with Tatsuo's patch is that it forecloses any reasonably upward-compatible extension to a user-selected behavior like (4). The current behavior can be extended and is simple to document (though we've neglected to do so). But once you've put in this arbitrary warping of the behavior of C locale, you're going to be at a dead end for improving it later. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
First
|
Prev
|
Pages: 1 2 3 4 5 6 Prev: functional call named notation clashes with SQLfeature Next: [HACKERS] Straightforward Synchronous Replication |