Prev: Fedora Core 11
Next: e100: eth1: e100_request_firmware: Failed to load firmware "e100/d101m_ucode.bin":-2
From: Marcel Bruinsma on 24 Sep 2009 22:31 Am Freitag, 25. September 2009 01:06, Bill Marcum a écrit : > On 2009-09-24, Marcel Bruinsma <mb(a)nomail.afraid.org> wrote: > >> No, the default CTYPE for de is ISO-8859-1. > > CP1252 is a superset of ISO-8859-1. The accented letters are the > same. CP1252 has additional punctuation marks and copyright and > trademark symbols, among other things (code values 128-159 which > are undefined in the ISO-8859-* character sets.) Exactly. Amongst those 'other things' are the frequently used quotation marks (U+2018..U+201F) : → printf '“„”\n' | iconv -tlatin1 | iconv -flatin1 iconv: Séquence d'échappement illégale à la position 0 → printf '“„”\n' | iconv -tcp1252 | iconv -fcp1252 “„” -- printf -v email $(echo \ 155 141 162 143 145 154 142 162 165 151 \ 156 163 155 141 100 171 141 150 157 157 056 143 157 155|tr \ \\\\) # Live every life as if it were your last! #
From: syd_p on 27 Sep 2009 17:40 On 25 Sep, 03:31, Marcel Bruinsma <m...(a)nomail.afraid.org> wrote: > Am Freitag, 25. September 2009 01:06, Bill Marcum a écrit : > > > On 2009-09-24, Marcel Bruinsma <m...(a)nomail.afraid.org> wrote: > > >> No, the default CTYPE for de is ISO-8859-1. > > > CP1252 is a superset of ISO-8859-1.  The accented letters are the > > same. CP1252 has additional punctuation marks and copyright and > > trademark symbols, among other things (code values 128-159 which > > are undefined in the ISO-8859-* character sets.) > > Exactly. Amongst those âother thingsâ are the frequently used > quotation marks (U+2018..U+201F) : > > â printf 'âââ\n' | iconv -tlatin1 | iconv -flatin1 > iconv: Séquence d'échappement illégale à la position 0 > â printf 'âââ\n' | iconv -tcp1252 | iconv -fcp1252 > âââ > > -- > printf -v email $(echo \ 155 141 162 143 145 154 142 162 165 151 \ > 156 163 155 141 100 171 141 150 157 157 056 143 157 155|tr \  \\\\) > #  Live every life as if it were your last!  # Aha! I get the output below- Not quite sure how you did the printf above tho. And not quite sure what I should set to say LANG and LC_ALL to en_us first and check that out? then set to en_us.CP1252. I did not originally set up the box (actually there are 6 or 8 of them) but I think that LANG=C was done cos there was a problem with LANG-en_us. Gotta go careful here, cos I guess I have to reboot to test. Thanks a lot for the help - I am getting there!!! $ locale -m | grep '^CP' CP10007 CP1125 CP1250 CP1251 CP1252 CP1253 CP1254 CP1255 CP1256 CP1257 CP1258 CP737 CP775 CP949 $ locale LANG=C LC_CTYPE="C" LC_NUMERIC="C" LC_TIME="C" LC_COLLATE="C" LC_MONETARY="C" LC_MESSAGES="C" LC_PAPER="C" LC_NAME="C" LC_ADDRESS="C" LC_TELEPHONE="C" LC_MEASUREMENT="C" LC_IDENTIFICATION="C" LC_ALL=
From: Marcel Bruinsma on 27 Sep 2009 20:04 Am Sonntag, 27. September 2009 23:40, syd_p a écrit : >> → printf '“„”\n' | iconv -tlatin1 | iconv -flatin1 >> iconv: Séquence d'échappement illégale à la position 0 >> → printf '“„”\n' | iconv -tcp1252 | iconv -fcp1252 >> “„” > > Not quite sure how you did the printf above tho. The three quotes above are actually encoded in UTF-8, because that is what my terminal understands. The first iconv on the second printf line converts from UTF-8 (my default in LANG) to CP1252 and doesn't report an error, meaning that those characters are valid in CP1252 encoding. The second iconv does the inverse : translate from CP1252 to UTF-8, and the result is the original string. The first printf passes the same UTF-8 encoded quotes to iconv, but asks to convert to latin1 (ISO-8859-1), and this time iconv says "illegal input sequence", because these quotes do not exist in latin1. > And not quite sure what I should set to say LANG > and LC_ALL to en_us first and check that out? Try, LANG=en_US.CP1252 locale LANG=en_US.ISO-8859-15 locale LANG=en_US.ISO-8859-1 locale LANG=en_US.UTF-8 locale and see, if any of these does *not* produce an error like this : $ LANG=en_US.FOO locale locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory Obviously, character encoding FOO doesn't exist. > I did not originally set up the box (actually there are 6 or 8 > of them) but I think that LANG=C was done cos there was > a problem with LANG-en_us. Anything is possible, but centos 3.8 isn't that old. In your OP you write : « However there are some special characters (u with 2 dots » overhead, for example) in the data which appear as ? in » the linux file created. » Is that a normal question mark, or is it inverse (white in a black hexagon or square), like this : � In the latter case, all you would have to do is convert the output from the db application with 'iconv -fcp1252 -tutf8'. -- printf -v email $(echo \ 155 141 162 143 145 154 142 162 165 151 \ 156 163 155 141 100 171 141 150 157 157 056 143 157 155|tr \ \\\\) # Live every life as if it were your last! #
From: syd_p on 28 Sep 2009 06:52 On 28 Sep, 01:04, Marcel Bruinsma <m...(a)nomail.afraid.org> wrote: > Am Sonntag, 27. September 2009 23:40, syd_p a écrit : > > >> â printf 'âââ\n' | iconv -tlatin1 | iconv -flatin1 > >> iconv: Séquence d'échappement illégale à la position 0 > >> â printf 'âââ\n' | iconv -tcp1252 | iconv -fcp1252 > >> âââ > > > Not quite sure how you did the printf above tho. > > The three quotes above are actually encoded in UTF-8, > because that is what my terminal understands. > > The first iconv on the second printf line converts from > UTF-8 (my default in LANG) to CP1252 and doesn't > report an error, meaning that those characters are > valid in CP1252 encoding. The second iconv does the > inverse : translate from CP1252 to UTF-8, and the > result is the original string. > > The first printf passes the same UTF-8 encoded quotes > to iconv, but asks to convert to latin1 (ISO-8859-1), and > this time iconv says "illegal input sequence", because > these quotes do not exist in latin1. > > > And not quite sure what I should set to say LANG > > and LC_ALL to en_us first and check that out? > > Try, > > LANG=en_US.CP1252 locale > LANG=en_US.ISO-8859-15 locale > LANG=en_US.ISO-8859-1 locale > LANG=en_US.UTF-8 locale > > and see, if any of these does *not* produce an error > like this : > > $ LANG=en_US.FOO locale > locale: Cannot set LC_CTYPE to default locale: No such file or directory > locale: Cannot set LC_MESSAGES to default locale: No such file or directory > locale: Cannot set LC_ALL to default locale: No such file or directory > > Obviously, character encoding FOO doesn't exist. > > > I did not originally set up the box (actually there are 6 or 8 > > of  them) but I think that LANG=C was done cos there was > > a problem with LANG-en_us. > > Anything is possible, but centos 3.8 isn't that old. > > In your OP you write : > « However there are some special characters (u with 2 dots > » overhead, for example) in the data which appear as ? in > » the linux file created. » > > Is that a normal question mark, or is it inverse (white in > a black hexagon or square), like this : > > In the latter case, all you would have to do is convert the > output from the db application with 'iconv -fcp1252 -tutf8'. > > -- > printf -v email $(echo \ 155 141 162 143 145 154 142 162 165 151 \ > 156 163 155 141 100 171 141 150 157 157 056 143 157 155|tr \  \\\\) > #  Live every life as if it were your last!  # Thanks!!!!! It is a normal question mark. I entered the commands as suggested > LANG=en_US.CP1252 locale -> Bad > LANG=en_US.ISO-8859-15 locale -> Good > LANG=en_US.ISO-8859-1 locale -> Good > LANG=en_US.UTF-8 locale -> Good ++++ $ LANG=en_US.CP1252 locale locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory LANG=en_US.CP1252 LC_CTYPE="en_US.CP1252" LC_NUMERIC="en_US.CP1252" LC_TIME="en_US.CP1252" LC_COLLATE="en_US.CP1252" LC_MONETARY="en_US.CP1252" LC_MESSAGES="en_US.CP1252" LC_PAPER="en_US.CP1252" LC_NAME="en_US.CP1252" LC_ADDRESS="en_US.CP1252" LC_TELEPHONE="en_US.CP1252" LC_MEASUREMENT="en_US.CP1252" LC_IDENTIFICATION="en_US.CP1252" LC_ALL= ]$ LANG=en_US.CP1252 locale locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory LANG=en_US.CP1252 LC_CTYPE="en_US.CP1252" LC_NUMERIC="en_US.CP1252" LC_TIME="en_US.CP1252" LC_COLLATE="en_US.CP1252" LC_MONETARY="en_US.CP1252" LC_MESSAGES="en_US.CP1252" LC_PAPER="en_US.CP1252" LC_NAME="en_US.CP1252" LC_ADDRESS="en_US.CP1252" LC_TELEPHONE="en_US.CP1252" LC_MEASUREMENT="en_US.CP1252" LC_IDENTIFICATION="en_US.CP1252" LC_ALL= ]$ LANG=en_US.ISO-8859-15 locale LANG=en_US.ISO-8859-15 LC_CTYPE="en_US.ISO-8859-15" LC_NUMERIC="en_US.ISO-8859-15" LC_TIME="en_US.ISO-8859-15" LC_COLLATE="en_US.ISO-8859-15" LC_MONETARY="en_US.ISO-8859-15" LC_MESSAGES="en_US.ISO-8859-15" LC_PAPER="en_US.ISO-8859-15" LC_NAME="en_US.ISO-8859-15" LC_ADDRESS="en_US.ISO-8859-15" LC_TELEPHONE="en_US.ISO-8859-15" LC_MEASUREMENT="en_US.ISO-8859-15" LC_IDENTIFICATION="en_US.ISO-8859-15" LC_ALL= [netcool(a)impact01 netcool]$ LANG=en_US.ISO-8859-1 locale LANG=en_US.ISO-8859-1 LC_CTYPE="en_US.ISO-8859-1" LC_NUMERIC="en_US.ISO-8859-1" LC_TIME="en_US.ISO-8859-1" LC_COLLATE="en_US.ISO-8859-1" LC_MONETARY="en_US.ISO-8859-1" LC_MESSAGES="en_US.ISO-8859-1" LC_PAPER="en_US.ISO-8859-1" LC_NAME="en_US.ISO-8859-1" LC_ADDRESS="en_US.ISO-8859-1" LC_TELEPHONE="en_US.ISO-8859-1" LC_MEASUREMENT="en_US.ISO-8859-1" LC_IDENTIFICATION="en_US.ISO-8859-1" LC_ALL= $ LANG=en_US.UTF-8 locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
From: syd_p on 29 Sep 2009 04:31 OK I got this: > LANG=en_US.CP1252 locale -> Bad > LANG=en_US.ISO-8859-15 locale -> Good > LANG=en_US.ISO-8859-1 locale -> Good > LANG=en_US.UTF-8 locale -> Good But I am not sure how to proceed. I have this from CP1252 "ë 00EB 235" which I want to handle in centos 3.8. And the glibc supports CP1252 $ locale -m | grep '^CP' .... CP1252 But "LANG=en_US.CP1252 locale" does not work. But with LANG=C which I thought was only 7 bits the following printfs work just fine. $ printf "(octal 353) is the character \0353\n" (octal 353) is the character ë printf "(octal 361) is the character \0361\n" (octal 361) is the character ñ These are two of the characters in the MSSQL db which the application (not open source) handles as "?". Puzzled now! Please help!
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: Fedora Core 11 Next: e100: eth1: e100_request_firmware: Failed to load firmware "e100/d101m_ucode.bin":-2 |