From: Heikki Linnakangas on
Takahiro Itagaki wrote:
> I heard pg_get_encoding_from_locale() failed in kor locale.
>
> WARNING: could not determine encoding for locale "kor": codeset is "CP949"
>
> I found the following description in the web:
> CP949 is EUC-KR, extended with UHC (Unified Hangul Code).
> http://www.opensource.apple.com/source/libiconv/libiconv-13.2/libiconv/lib/cp949.h
>
> but we define CP51949 for EUC-KR in chklocale.c.
> {PG_EUC_KR, "CP51949"}, /* or 20949 ? */
>
> Which is the compatible codeset with our PG_EUC_KR encoding?
> 949, 51949, or 20949?

A bit of googling suggests that 51949 is indeed the Windows codepage
that's equivalent with EUC-KR.

> Should we add (or replace) CP949 for EUC-KR?

No. CP949 is not plain EUC-KR, but EUC-KR with some extensions (UHC). At
least on CVS HEAD, we recognize CP949 as an alias for the PostgreSQL
PG_UHC encoding. There's a significant difference between the two,
because PG_EUC_KR is supported as a server-encoding while PG_UHC is not.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Takahiro Itagaki on

Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> wrote:

> > Should we add (or replace) CP949 for EUC-KR?
>
> No. CP949 is not plain EUC-KR, but EUC-KR with some extensions (UHC). At
> least on CVS HEAD, we recognize CP949 as an alias for the PostgreSQL
> PG_UHC encoding.

That's it! We should have added an additional alias to chklocale, too.

Index: src/port/chklocale.c
===================================================================
--- src/port/chklocale.c (HEAD)
+++ src/port/chklocale.c (fixed)
@@ -172,6 +172,7 @@
{PG_GBK, "CP936"},

{PG_UHC, "UHC"},
+ {PG_UHC, "CP949"},

{PG_JOHAB, "JOHAB"},
{PG_JOHAB, "CP1361"},


Except UHC, we don't have any codepage aliases for the encodings below.
I assume we don't need to add CPxxx because Windows does not have
corresponding codepages for them, right?

{PG_LATIN6, "ISO-8859-10"},
{PG_LATIN7, "ISO-8859-13"},
{PG_LATIN8, "ISO-8859-14"},
{PG_LATIN10, "ISO-8859-16"},
{PG_SHIFT_JIS_2004, "SJIS_2004"},

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on
Takahiro Itagaki wrote:
> That's it! We should have added an additional alias to chklocale, too.
>
> Index: src/port/chklocale.c
> ===================================================================
> --- src/port/chklocale.c (HEAD)
> +++ src/port/chklocale.c (fixed)
> @@ -172,6 +172,7 @@
> {PG_GBK, "CP936"},
>
> {PG_UHC, "UHC"},
> + {PG_UHC, "CP949"},
>
> {PG_JOHAB, "JOHAB"},
> {PG_JOHAB, "CP1361"},

Yeah, seems correct.

> Except UHC, we don't have any codepage aliases for the encodings below.
> I assume we don't need to add CPxxx because Windows does not have
> corresponding codepages for them, right?
>
> {PG_LATIN6, "ISO-8859-10"},
> {PG_LATIN7, "ISO-8859-13"},
> {PG_LATIN8, "ISO-8859-14"},
> {PG_LATIN10, "ISO-8859-16"},
> {PG_SHIFT_JIS_2004, "SJIS_2004"},

Yeah, I guess so. I can't find Windows codepages for these either, by
google.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Ioseph Kim" on
Hi, I'm Korean.

CP51949 is EUC-KR correct.
so, that defined code is correct too.

But in Korea, EUC-KR code is not good to use all Korean character.
In recent years, many people in Korea use the CP949 code.
MS Windows codepage also is CP949.

----- Original Message -----
From: "Takahiro Itagaki" <itagaki.takahiro(a)oss.ntt.co.jp>
To: <pgsql-hackers(a)postgresql.org>
Sent: Tuesday, April 27, 2010 7:27 PM
Subject: [HACKERS] CP949 for EUC-KR?


>I heard pg_get_encoding_from_locale() failed in kor locale.
>
> WARNING: could not determine encoding for locale "kor": codeset is "CP949"
>
> I found the following description in the web:
> CP949 is EUC-KR, extended with UHC (Unified Hangul Code).
> http://www.opensource.apple.com/source/libiconv/libiconv-13.2/libiconv/lib/cp949.h
>
> but we define CP51949 for EUC-KR in chklocale.c.
> {PG_EUC_KR, "CP51949"}, /* or 20949 ? */
>
> Which is the compatible codeset with our PG_EUC_KR encoding?
> 949, 51949, or 20949? Should we add (or replace) CP949 for EUC-KR?
>
> Regards,
> ---
> Takahiro Itagaki
> NTT Open Source Software Center
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From: Takahiro Itagaki on

"Ioseph Kim" <pgsql-kr(a)postgresql.kr> wrote:

> CP51949 is EUC-KR correct.
> > {PG_EUC_KR, "CP51949"}, /* or 20949 ? */

Thank you for the information. I removed "or 20949 ?" from the line.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers