From: aldnin on 21 Jul 2007 12:24 > Please configure your email client so we don't receive 5 copies of your > mail. Just fixed that issue, don't be afraid of that in the future. > This indicates that PHP not using UTF-8. That output is typical of > UTF-8 output as Latin characters. Well, maybe the output is not correct - when running the php script on console (cli) it outputs me the content in the wrong charset, but that's not the problem, doing a utf8_decode() lets me output it in the right charset. > Not true, it only indicates that phpPgAdmin is is configured to handle > UTF-8 correctly. Well, I searched all the source code of phpPgAdmin for charsets and I found: "echo "\t<meta http-equiv=\"Content-Type\" content=\"text/html; charset={$data->codemap[$dbEncoding]}\" />\r\n";" So this means, phpPgAdmin sets the output charset to the charset which is used by the databased connected to - but that's still not the problem, because I also know how to fix charset output in browsers. > Once again indicating your data needs to be converted from some other > character set. It's already converted to be compatible to utf8 when fetching it from some other ressources. > I had similar problems getting PHP to work with UTF-8 and MySQL. Many > of PHP's function are not multibyte aware and assume a Latin character set. > What, if any, output buffering are you using? What is your > default_charset set to? Well, I've set the default_charset to UTF8, it was set before to "" (empty) - but the output on console (cli) and the problem is still the same also after changing this to UTF8, so: this is not the problem, and I don't need proper output on console without utf8_decode() - if I want proper output there I just do a decode, like I do when I want it to get outputed in the browser properly. Maybe a cleaner explanation of the problem: I fetch something from database, which looks like "lacarrière" when I output it in PHP - well don't let us get confused from PHPs output. Then I fetch something from another ressource looking like "lacarri�re" - when I compare both strings in PHP it tells me that they are "not equal". So I HAVE TO do either an utf8_encode() on the string from the other ressource OR a utf8_decode() on the string from the database to compare them as "equal". ....and THIS means a lot of more code in my classes. Hint: The other ressource is a socket connection (API) to another server. The problem is quite simple I think, everything comming from the database is UTF8-byte encoded and needs to get UTF8-Decoded before you can work with it properly. The default_charset seems to work only on output buffer, so the solution for that problem could only be a mechanism to tell PHP handling all strings UTF8 byte encoded, which should mean a lot of more ressources to be taken for this process - I understand that this is not a solution. So the only solutions could be: a) Decode and encode properly utf8 stuff and to take care if the content is utf8-byte encoded so it needs to be decoded before using it properly with other strings b) A mechanism to tell the pg-functions in PHP to decode all data which is UTF8-Encoded. The ADODB-Layers seems to do that properly, but the pg-functions don't do that as I can see. You can use this to reproduce it: 1. Create a table in postgres, on a UTF8 initialized database, insert something like "lacarri�re" into it. Check if it's inserted correctly.. 2. Check with psql the normal output, you should get either "lacarri�re" or "lacarrière" so you can be sure it's inserted correctly. 3. Make a script which fetchs the string from the database to $dbString. 4. Set a string $phpString = "lacarri�re"; 5. Compare both strings with "==" - you'll get "false" Another hint: Try to send "select 'lacarri�re' as test;' with pg_query to any postgres database, you'll get an error, if not... well, then I'm wrong and I've set up PHP wrong to handle UTF8-stuff. If you send "select '".utf8_encode(lacarri�re)."' as test;" to your database this should work. Also the above meant $phpString is NOT EQUAL to the result you would get from "select '".utf8_encode(lacarri�re)."' as test;", you would need to compare it to utf8_decode($dbString) to be EQUAL.
From: aldnin on 21 Jul 2007 15:42 > You did not answer the most important question. What, if any, output > buffering are you using? Are you using the mbstring module? If so, is > it set to overload the old string functions? Well, i checked for Multi Byte String functions, and it was enabled and configured before compiling with "=all". After performing the query with pg_query, fetching the result with pg_fetch_all and putting the utf8 string into $dbString I tried to detect the encoding with: mb_detect_encoding($dbSring) I tells me: ASCII The content of $dbString is: lacarrière I overloaded the mbstring variables with: mbstring.func_overload = 6 Setting it to "7" won't let me even echo something else. mbstring.encoding_translation = On mbstring.internal_encoding = UTF8 That's it, rest is default. Is it possible for mbstring to overload the pg-functions I need?
From: aldnin on 21 Jul 2007 20:28 thx a lot - what you're writing is really necessary to handle this problems in the future. The reason why I was looking for a faster solution is when you have to handle huge data which is utf8, and sometimes not utf8... etc.... you understand what I mean? ;-) Bruno Lustosa wrote: > On 7/21/07, aldnin <aldnin(a)yahoo.de> wrote: >> When I try to send this query (select 'lacarrière' as test;) to a UTF8 >> initialized pgsql-database (8.2.4) from PHP 5.2.3 I get this error: >> >> ERROR: invalid byte sequence for encoding "UTF8": 0xe87265 > > Short answer: start using utf-8 for just everything, and your problems > will be gone. > > Long explanation: > This is usually the case when you get data from a form and put it in > the database, and the two aren't using the same encoding. > I guess your pg connection is using unicode (so the db expects unicode > input), and your html is set to something else. To fix this, you have > two choices: > > 1-Run utf8_encode() on the input from your forms; or > 2-Set all your html pages to use utf-8 encoding. > > IMHO, option 2 is the way to go. I've been using utf-8 for everything > for quite some time, and has solved all my problems dealing with > accents, and so on. > You will need: > - All your HTML files encoded to utf-8 (quite easy with iconv, if you > are using Linux); > - Add a "Content-type: text/html; charset=utf-8" to all your pages. > This is easily done using PHP's header() function in a file included > by all your scripts. > > This way, the pages will be unicode, any data entered will be posted > as unicode, and you will have no problems sending them to a database > that uses unicode. > Forget the <meta> tag that sets the encoding. It's only used in case > the server doesn't send a Content-type header, which isn't the case > normally. By default, I think at least apache sends the content-type > as iso8859-1. >
From: aldnin on 21 Jul 2007 20:51 > output_handler=mb_output_handler This helped me to fix any output to the browser properly, so I don't need to do any utf8_decode() any more, thanks. > Setting it to "7" won't let me even echo something else. Right, it's strange, but true... :-( > mbstring.detect_order = UTF-8,eucjp-win,sjis-win That solved the problem that mb_detect_encoding() was resulting with ASCII, now its saying "UTF-8", BUT only when running the script on console, with browser it tells me still ASCII, well not important. But still the comparison test is "not equal", so the ut8_decode() is still needed when data comes from database, it's the same result in browser and on console (even it shows UTF-8 as detected). > The other thing to be wary of, is output to the console. Some OSes do > not support unicode in the console. So unless you're certain yours does, > I wouldn't use it as a test. I know, that's why I use the comparison test ;-) Niel wrote: > Hi > > You still haven't answered whether you're using any output handler, and > if so which one. I use > > output_handler=mb_output_handler > >> I overloaded the mbstring variables with: >> mbstring.func_overload = 6 >> Setting it to "7" won't let me even echo something else. > > Very strange, the only additional function overloaded is mail() and that > shouldn't stop you using echo. > > As well as setting the internal encoding and enabling it with > mbstring.encoding_translation = On > mbstring.internal_encoding = UTF-8 > > I would also use: > mbstring.language = English > ; or German in your case > mbstring.detect_order = UTF-8,eucjp-win,sjis-win > mbstring.http_input = UTF-8,SJIS,EUC-JP > mbstring.http_output = UTF-8 > >> Is it possible for mbstring to overload the pg-functions I need? > No, and it shouldn't be needed. Those functions should be UTF-8 enabled > in order to communicate with the database and supply the correct data > > You're still referring to 'UTF8' which as I pointed out isn't the > official name of the encoding system. I have no idea if PHP will > recognise it, but to be safe I suggest you use the official 'UTF-8' > (hyphen between letters and number) in case it's causing problems. > The other thing to be wary of, is output to the console. Some OSes do > not support unicode in the console. So unless you're certain yours does, > I wouldn't use it as a test. > > -- > Niel Archer
First
|
Prev
|
Pages: 1 2 Prev: odbc problem Next: Retreving X, Y, Z from the Geometry column in oracle 10g |