From: Itagaki Takahiro on 17 Nov 2009 02:40 Peter Eisentraut <peter_e(a)gmx.net> wrote: > I think I could support using the presence of the BOM as a fall-back > indicator of encoding in absence of any other declaration. What is the difference the fall-back and <<set client encoding to UTF-8 if BOM found>> ? I read this discussion that we cannot accept any automatic encoding detections (properly speaking, detection is ok, but automatic assignment is not). We should not have any fall-back mechanism, no? > Also, when the proposed patch to set the encoding from the locale > appears, we need to make this logic more precise. Encoding-from-locale feature will be useful, but the patch does *not* set any encodings. The reason is same as above. > Also, I'm not sure if we need this logic only when we send a query. It > might be better to do this in the lexer when we find a non-ASCII > character and we don't have a client encoding != SQL_ASCII set yet. Absolutely, but is it an indepedent issue from BOM? Multi-byte scripts without encoding are always dangerous whether BOM is present or not. I'd say we can always throw an error when we find queries that contain multi-byte characters if no prior encoding declaration. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Andrew Dunstan on 17 Nov 2009 09:08 Itagaki Takahiro wrote: > Multi-byte scripts > without encoding are always dangerous whether BOM is present or not. > I'd say we can always throw an error when we find queries that contain > multi-byte characters if no prior encoding declaration. > > > You will break a gazillion scripts that today work quite happily if you do. I think you have really not thought out these proposals well. Maybe there is a case for a extra command line switch to set the initial client encoding for psql, which would make that a little easier and less obscure to do. Would that make things simpler for you? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 17 Nov 2009 10:50 Peter Eisentraut <peter_e(a)gmx.net> writes: > I think I could support using the presence of the BOM as a fall-back > indicator of encoding in absence of any other declaration. It seems to > me, however, that the description above ignores the existence of > encodings other than SQL_ASCII and UTF8. Yeah. This entire proposal rests on the assumption that UTF8 is the only encoding that really matters, and introducing a possibility of breaking things for users of other encodings is acceptable damage. I do not think that supporting a deprecated-by-standards behavior is worth that. Even assuming that we had consensus on a behavior that involved silently changing client_encoding, I do not believe that it's practical to implement it in an acceptable fashion. Just issuing a SET behind the user's back will not work in a number of scenarios: * We are inside a transaction when \i is called, and the file contains a ROLLBACK. * We are inside a failed transaction when \i is called --- the SET won't even work at all. * Same two cases inside a savepoint. * The file contains a \c command. If you expect that the previous client_encoding should be restored at the end of the \i inclusion (as I certainly would) then you have the first three hazards at file end as well, except that now the odds of being inside a failed transaction are significantly higher. Also, what if the file contained a SET CLIENT_ENCODING command itself? How should that interact with this? Lastly, a silent change of client_encoding would also affect the encoding of notice and error messages that come out while the \i file is running. I fail to find that non-astonishing, either. I think that the only way this sort of behavior could be implemented without a bunch of broken corner cases would be if we put the responsibility of encoding conversion inside psql, so that switching its idea of the encoding was just a local change rather than something it had to ask the backend to do, and it could be careful to apply the encoding only to the data coming from the \i file. Which is possible, perhaps, but it hardly seems that slightly-more-convenient BOM handling is worth it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Peter Eisentraut on 17 Nov 2009 12:03 On tis, 2009-11-17 at 09:31 +0900, Itagaki Takahiro wrote: > Peter Eisentraut <peter_e(a)gmx.net> wrote: > > > OK, I think the consensus here is: > > - Eat BOM at beginning of file (as you implemented) > > - Only when client encoding is UTF-8 --> please fix that > > Are they AND condition? If so, this patch will be useless. > Please remember \encoding or SET client_encoding appear > *after* BOM at beginning of file. Presumably, if you have editors throwing in BOM marks without asking, you have an environment where either a) You can set the client encoding to UTF8 in the environment, so it applies by default, or b) The server encoding is UTF8, so the client encoding will default to that. Together, that should cover a lot of cases. Not perfect, but far from useless. -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Peter Eisentraut on 17 Nov 2009 12:05
On tis, 2009-11-17 at 00:59 -0800, Chuck McDevitt wrote: > Or is there a plan to read and convert the UTF-16 or UTF-32 to UTF-8, > so psql and PostgreSQL understand it? > (BTW, that would actually be nice on Windows, where UTF-16 is common). Well, someone could implement UTF-16 or UTF-whatever as client encoding. But I have not heard of any concrete proposals about that. -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |