From: Bruce Momjian on
Hiroshi Inoue wrote:
> >>>> I need someone with WIN32 experience to review and test this patch.
> >>> I don't understand why cache_locale_time() works on Windows. It sets
> >>> the LC_CTYPE but does not do any encoding coversion.
> >> Doesn't strftime_win32 do the conversion?
> >
> > Oh, I now see strftime is redefined as a macro in that C files. Thanks.
> >
> >>> Do month and
> >>> day-of-week names not work either, or do they work and the encoding
> >>> conversion for numeric/money, e.g. Euro, it not necessary?
> >> db_strdup does the conversion.
> >
> > Should we pull the encoding conversion into a separate function and have
> > strftime_win32() and db_strdup() both call it?
>
> We may be able to pull the conversion WideChars => UTF8 =>
> a PG encoding into an function.

OK, I have created a new function, win32_wchar_to_db_encoding(), to
share the conversion from wide characters to the database encoding.
New patch attached.

> BTW both PGLC_localeconv() and cache_locale_time() save the current
> LC_CTYPE first and restore them just before returning the functions.
> I'm suspicious if it's OK when errors occur in middle of the functions.

Yea, I added a comment questioning if that is a problem.

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
From: Takahiro Itagaki on

Bruce Momjian <bruce(a)momjian.us> wrote:

> OK, I have created a new function, win32_wchar_to_db_encoding(), to
> share the conversion from wide characters to the database encoding.
> New patch attached.

Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify
db_encoding_strdup() with the function. Like this:

static char *
db_encoding_strdup(const char *str)
{
char *pstr;
char *mstr;

/* convert the string to the database encoding */
pstr = (char *) pg_do_encoding_conversion(
(unsigned char *) str, strlen(str),
GetPlatformEncoding(), GetDatabaseEncoding());
mstr = strdup(pstr);
if (pstr != str)
pfree(pstr);

return mstr;
}

I beleive the code is harmless on all platforms and we can use it
instead of strdup() without any #ifdef WIN32 quotes.


BTW, I found we'd better to add "ANSI_X3.4-1968" as an alias for
PG_SQL_ASCII. My Fedora 12 returns the name when --no-locale is used.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Bruce Momjian on
Takahiro Itagaki wrote:
>
> Bruce Momjian <bruce(a)momjian.us> wrote:
>
> > OK, I have created a new function, win32_wchar_to_db_encoding(), to
> > share the conversion from wide characters to the database encoding.
> > New patch attached.
>
> Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify
> db_encoding_strdup() with the function. Like this:
>
> static char *
> db_encoding_strdup(const char *str)
> {
> char *pstr;
> char *mstr;
>
> /* convert the string to the database encoding */
> pstr = (char *) pg_do_encoding_conversion(
> (unsigned char *) str, strlen(str),
> GetPlatformEncoding(), GetDatabaseEncoding());
> mstr = strdup(pstr);
> if (pstr != str)
> pfree(pstr);
>
> return mstr;
> }
>
> I beleive the code is harmless on all platforms and we can use it
> instead of strdup() without any #ifdef WIN32 quotes.

OK, I don't have any Win32 people testing this patch so if we want this
fixed for 9.0 someone is going to have to test my patch to see that it
works. Can you make the adjustments suggested above to my patch and
test it to see that it works so we can apply it for 9.0?

> BTW, I found we'd better to add "ANSI_X3.4-1968" as an alias for
> PG_SQL_ASCII. My Fedora 12 returns the name when --no-locale is used.

OK.

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Takahiro Itagaki on

Bruce Momjian <bruce(a)momjian.us> wrote:

> Takahiro Itagaki wrote:
> > Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify
> > db_encoding_strdup() with the function. Like this:
>
> OK, I don't have any Win32 people testing this patch so if we want this
> fixed for 9.0 someone is going to have to test my patch to see that it
> works. Can you make the adjustments suggested above to my patch and
> test it to see that it works so we can apply it for 9.0?

Here is a full patch that can be applied cleanly to HEAD.
Can anyone test it on Windows?

I'm not sure why temporary changes of lc_ctype was required in the
original patch. The codes are not included in my patch, but please
notice me it is still needed.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

From: Bruce Momjian on
Takahiro Itagaki wrote:
>
> Bruce Momjian <bruce(a)momjian.us> wrote:
>
> > Takahiro Itagaki wrote:
> > > Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify
> > > db_encoding_strdup() with the function. Like this:
> >
> > OK, I don't have any Win32 people testing this patch so if we want this
> > fixed for 9.0 someone is going to have to test my patch to see that it
> > works. Can you make the adjustments suggested above to my patch and
> > test it to see that it works so we can apply it for 9.0?
>
> Here is a full patch that can be applied cleanly to HEAD.
> Can anyone test it on Windows?
>
> I'm not sure why temporary changes of lc_ctype was required in the
> original patch. The codes are not included in my patch, but please
> notice me it is still needed.

Sorry for the delay in replying to you.

I considered your idea of using the existing Postgres encoding
conversion routines to do the conversion of localenv() strings, but
found two problems.

First, GetPlatformEncoding() caches its result, so it assumes the
LC_CTYPE never changes for the server, while fixing this issue actually
requires us to change LC_CTYPE. We could avoid the caching but that
then involves complex table lookups, etc, which seems overly complex:

+ /* convert the string to the database encoding */
+ pstr = (char *) pg_do_encoding_conversion(
+ (unsigned char *) str, strlen(str),
+ GetPlatformEncoding(), GetDatabaseEncoding());

Second, having our backend routines do the conversion seems wrong
because it is possible for someone to set LC_MONETARY to an encoding
that our database does not understand, e.g. UTF16, but one that WIN32
can convert to a valid encoding.

The reason we are doing all this is because of this updated comment in
my patch:

ftp://momjian.us/pub/postgresql/mypatches/pg_locale

+ * Ideally, monetary and numeric local symbols could be returned in
+ * any server encoding. Unfortunately, the WIN32 API does not allow
+ * setlocale() to return values in a codepage/CTYPE that uses more
+ * than two bytes per character, like UTF-8:
+ *
+ * http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
+ *
+ * Evidently, LC_CTYPE allows us to control the encoding used
+ * for strings returned by localeconv(). The Open Group
+ * standard, mentioned at the top of this C file, doesn't
+ * explicitly state this.
+ *
+ * Therefore, we set LC_CTYPE to match LC_NUMERIC and
+ * LC_MONETARY, call localeconv(), and use mbstowcs() to
+ * convert the locale-aware string, e.g. Euro symbol (which
+ * is not in UTF-8), to the server encoding.

One new idea would be to set LC_CTYPE to UTF16/widechars unconditionally
on Win32 and then just convert that always to the server encoding with
win32_wchar_to_db_encoding(), instead of using the encoding from
LC_MONETARY to set LC_CTYPE and having to do double-conversion.

--
Bruce Momjian <bruce(a)momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers