LANG setting for MS CP 1252 [Setup]

Prev: Fedora Core 11
Next: e100: eth1: e100_request_firmware: Failed to load firmware "e100/d101m_ucode.bin":-2

From: syd_p on 24 Sep 2009 05:57

On 6 Aug, 03:51, Nico Kadel-Garcia <nka...(a)gmail.com> wrote:
> On Aug 5, 6:20 pm, syd_p <sydneypue...(a)yahoo.com> wrote:
>
> > Hello,
> > I have application running on centos 3.8 which brings back some data
> > from a MS SQL server db, and writes it to disk.
>
> OK. Stop right there. *WHY* are you using a 6 year old operating
> system for anything you care about? Seriously, if at all possible,
> update to CentOS 4.7 at a minimum, preferably 5.3. You'll get much
> better international language support.
>
>
>
> > However there are some special characters ( u with 2 dots overhead,
> > for example) in the data which appear as ? in the linux file created..
> > I am told the database uses CP 1252, which means the u with 2 dots
> > overhead,is character 252,
>
> > The output of locale is;
> > $ locale
> > LANG=C
> > LC_CTYPE="C"
> > LC_NUMERIC="C"
> > LC_TIME="C"
> > LC_COLLATE="C"
> > LC_MONETARY="C"
> > LC_MESSAGES="C"
> > LC_PAPER="C"
> > LC_NAME="C"
> > LC_ADDRESS="C"
> > LC_TELEPHONE="C"
> > LC_MEASUREMENT="C"
> > LC_IDENTIFICATION="C"
> > LC_ALL=
>
> > What can I do to fix this problem please?
>
> Well, it depends. The strings you are handling are not 7-bit ASCII
> text, which is what the 'C' format is generally for, they're
> effectively binary data. Treat them as such. If you need them to be
> visiable, consider setting your LANG and other settings to German or
> whatever language with umlauts they were originally written in.
>
> What are you passing this data to? Is it possible that your viewer for
> the Linux text file is simply mishandling the generated non-English
> character set?
Thanks very much for your response - which I missed until now.
This is still a problem - just more urgent.

*WHY* are you using a 6 year old operating system?
Well it is a 5 year old install and only now are we being fed data
with "odd" characters.
There is a new 5.3 platform coming soon - but the old system will be
around until next year at least.

Based on the example I mentioned using LANG=de would be a possible
solution.
But we are seeing French, Spanish and German "special" characters
which are supported by MS's CP 1252.

Any other ideas?

TIA

Syd

From: syd_p on 24 Sep 2009 05:57

From: syd_p on 24 Sep 2009 05:59

On 24 Sep, 10:57, syd_p <sydneypue...(a)yahoo.com> wrote:
> On 6 Aug, 03:51, Nico Kadel-Garcia <nka...(a)gmail.com> wrote:
>
> > On Aug 5, 6:20 pm, syd_p <sydneypue...(a)yahoo.com> wrote:
>
> > > Hello,
> > > I have application running on centos 3.8 which brings back some data
> > > from a MS SQL server db, and writes it to disk.
>
> > OK. Stop right there. *WHY* are you using a 6 year old operating
> > system for anything you care about? Seriously, if at all possible,
> > update to CentOS 4.7 at a minimum, preferably 5.3. You'll get much
> > better international language support.
>
> > > However there are some special characters ( u with 2 dots overhead,
> > > for example) in the data which appear as ? in the linux file created.
> > > I am told the database uses CP 1252, which means the u with 2 dots
> > > overhead,is character 252,
>
> > > The output of locale is;
> > > $ locale
> > > LANG=C
> > > LC_CTYPE="C"
> > > LC_NUMERIC="C"
> > > LC_TIME="C"
> > > LC_COLLATE="C"
> > > LC_MONETARY="C"
> > > LC_MESSAGES="C"
> > > LC_PAPER="C"
> > > LC_NAME="C"
> > > LC_ADDRESS="C"
> > > LC_TELEPHONE="C"
> > > LC_MEASUREMENT="C"
> > > LC_IDENTIFICATION="C"
> > > LC_ALL=
>
> > > What can I do to fix this problem please?
>
> > Well, it depends. The strings you are handling are not 7-bit ASCII
> > text, which is what the 'C' format is generally for, they're
> > effectively binary data. Treat them as such. If you need them to be
> > visiable, consider setting your LANG and other settings to German or
> > whatever language with umlauts they were originally written in.
>
> > What are you passing this data to? Is it possible that your viewer for
> > the Linux text file is simply mishandling the generated non-English
> > character set?
>
> Thanks very much for your response - which I missed until now.
> This is still a problem - just more urgent.
>
> *WHY* are you using a 6 year old operating system?
> Well it is a 5 year old install and only now are we being fed data
> with "odd" characters.
> There is a new 5.3 platform coming soon - but the old system will be
> around until next year at least.
>
> Based on the example I mentioned using LANG=de would be a possible
> solution.
> But we are seeing French, Spanish and German "special" characters
> which are supported by MS's CP 1252.
>
> Any other ideas?
>
> TIA
>
> Syd
PS
just used od (octal dump) to look at the output - it is not fault the
viewer
the blame lies elsewhere (with me probably ;-)

From: Marcel Bruinsma on 24 Sep 2009 17:41

Am Donnerstag 24 September 2009 11:57, syd_p a écrit :

> Based on the example I mentioned using LANG=de would be
> a possible solution.

No, the default CTYPE for de is ISO-8859-1.

> But we are seeing French, Spanish and German "special"
> characters which are supported by MS's CP 1252.

Check if your libc supports CP1252 :

$ locale -m | grep '^CP'
CP10007
CP1125
CP1250
CP1251
CP1252
CP1253
CP1254
CP1255
CP1256
CP1257
CP1258
CP737
CP775
CP949

If it does : LANG=en_US.CP1252
Of course, you can replace "en_US" with wathever you prefer.
The important part here is the ".CP1252", which defines the
locale's character set (and encoding). This is independent
from language (the "en") and region (the "_US").

--
printf -v email $(echo \ 155 141 162 143 145 154 142 162 165 151 \
156 163 155 141 100 171 141 150 157 157 056 143 157 155|tr \ \\\\)
# Live every life as if it were your last! #

From: Bill Marcum on 24 Sep 2009 19:06

On 2009-09-24, Marcel Bruinsma <mb(a)nomail.afraid.org> wrote:
> Am Donnerstag 24 September 2009 11:57, syd_p a écrit :
>
>> Based on the example I mentioned using LANG=de would be
>> a possible solution.
>
> No, the default CTYPE for de is ISO-8859-1.
>
CP1252 is a superset of ISO-8859-1. The accented letters are the same.
CP1252 has additional punctuation marks and copyright and trademark
symbols, among other things (code values 128-159 which are undefined
in the ISO-8859-* character sets.)

| Next | Last
Pages: 1 2 3 4
Prev: Fedora Core 11
Next: e100: eth1: e100_request_firmware: Failed to load firmware "e100/d101m_ucode.bin":-2