From: Peter Eisentraut on
On fre, 2010-03-19 at 11:50 -0400, Andrew Dunstan wrote:
> Peter Eisentraut wrote:
> > Log Message:
> > -----------
> > Prevent the injection of invalidly encoded strings by PL/Python into PostgreSQL
> > with a few strategically placed pg_verifymbstr calls.

> Awesome. Do we need to fix pltcl too?

Short answer: yes

I have never used Tcl before just now, and the documentation is sketchy,
but it looks like the behavior of Tcl is kind of mixed in this area.

Escapes such as "\xd0" are apparently converted to Unicode code points
rather than bytes when the appropriate OS locale is set. So that is
safe. Except that it doesn't work in some locale/charset setups, such
as EUC_JP. To adapt Hannu's original example:

CREATE TABLE utf_test
(
id serial PRIMARY KEY,
data character varying
);

CREATE OR REPLACE FUNCTION invalid_utf_seq()
RETURNS character varying AS
$BODY$
return "\xd0";
$BODY$
LANGUAGE 'pltclu' VOLATILE STRICT;

insert into utf_test(data) values(invalid_utf_seq());

-- This works in UTF8 and LATIN1 with the right locales, but ...

select invalid_utf_seq();
ERROR: 22021: invalid byte sequence for encoding "EUC_JP": 0xc390


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers