From: Pavel Stehule on
2010/1/16 Robert Haas <robertmhaas(a)gmail.com>:
> On Thu, Jan 14, 2010 at 8:46 PM, Robert Haas <robertmhaas(a)gmail.com> wrote:
>> I have yet to fully review the code but on a quick glance it looks reasonable.
>
> On further review, it looks less reasonable.  :-(
>
> The new PQescapeIdentConn function is basically a cut-up version of
> PQescapeStringInternal, which seems like a reasonable foundation, but
> it rips out a little too much - specifically:
>
> 1. the length argument,
> 2. the size_t return value,
> 3. the portion of the handling for incomplete multibyte characters
> that prevents us from overrunning the output buffer on a maliciously
> constructed (or unlucky) input string, and
> 4. some relevant comments.
>

Yes, I didn't repeat parameter's pattern from PQescapeStringConn, I
would to simplify interface but I hasn't problem to modify it to same
interface.

I rewrote patch so now interface for PQescapeIdentConn is same as
PQescapeStringConn

@3. I though so the protection under incomplete multibyte chars are
enought - missing bytes are replaced by space - like
PQescapeStringConn does. But now - mechanism is exactly same, so this
problem should be solved.

Regards
Pavel Stehule


> I'm inclined to think we should put all of that stuff back, but
> certainly #3 at a minimum.
>
> ...Robert
>
From: Robert Haas on
On Sun, Jan 17, 2010 at 2:04 PM, Pavel Stehule <pavel.stehule(a)gmail.com> wrote:
> I rewrote patch so now interface for PQescapeIdentConn is same as
> PQescapeStringConn
>
> @3. I though so the protection under incomplete multibyte chars are
> enought - missing bytes are replaced by space - like
> PQescapeStringConn does.

That much is fine, but the output buffer is only guaranteed to be of
size 2n+1. Imagine the input is two double-quotes followed by a byte
for which pg_encoding_mblen() returns 4. The input is 3 characters
long so the user was responsible to provide 7 bytes of output space,
but you'll try to write 9 bytes to it (including the terminating NUL).

> But now - mechanism is exactly same, so this
> problem should be solved.

This is no better. What the function does no longer matches either
its comments or the documentation (which also contradict each other).

Let me take a crack at this and post a patch. We're making this
harder than it needs to be.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Pavel Stehule on
2010/1/18 Robert Haas <robertmhaas(a)gmail.com>:
> On Sun, Jan 17, 2010 at 2:04 PM, Pavel Stehule <pavel.stehule(a)gmail.com> wrote:
>> I rewrote patch so now interface for PQescapeIdentConn is same as
>> PQescapeStringConn
>>
>> @3. I though so the protection under incomplete multibyte chars are
>> enought - missing bytes are replaced by space - like
>> PQescapeStringConn does.
>
> That much is fine, but the output buffer is only guaranteed to be of
> size 2n+1.  Imagine the input is two double-quotes followed by a byte
> for which pg_encoding_mblen() returns 4.  The input is 3 characters
> long so the user was responsible to provide 7 bytes of output space,
> but you'll try to write 9 bytes to it (including the terminating NUL).
>

I don't understand. The "length" is number of bytes, not number of
chars. It is maybe bad documented only. If your input string has 6
bytes, then buffer have to allocated to 13 bytes. Nobody knows how
much is chars there.

>> But now - mechanism is exactly same, so this
>> problem should be solved.
>
> This is no better.  What the function does no longer matches either
> its comments or the documentation (which also contradict each other).
>
> Let me take a crack at this and post a patch.  We're making this
> harder than it needs to be.
>

sure, please.

Pavel

> ...Robert
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Mon, Jan 18, 2010 at 1:52 PM, Pavel Stehule <pavel.stehule(a)gmail.com> wrote:
> 2010/1/18 Robert Haas <robertmhaas(a)gmail.com>:
>> On Sun, Jan 17, 2010 at 2:04 PM, Pavel Stehule <pavel.stehule(a)gmail.com> wrote:
>>> I rewrote patch so now interface for PQescapeIdentConn is same as
>>> PQescapeStringConn
>>>
>>> @3. I though so the protection under incomplete multibyte chars are
>>> enought - missing bytes are replaced by space - like
>>> PQescapeStringConn does.
>>
>> That much is fine, but the output buffer is only guaranteed to be of
>> size 2n+1.  Imagine the input is two double-quotes followed by a byte
>> for which pg_encoding_mblen() returns 4.  The input is 3 characters
>> long so the user was responsible to provide 7 bytes of output space,
>> but you'll try to write 9 bytes to it (including the terminating NUL).
>>
> I don't understand. The "length" is number of bytes, not number of
> chars. It is maybe bad documented only. If your input string has 6
> bytes, then buffer have to allocated to 13 bytes. Nobody knows how
> much is chars there.

Right, but the point is we can't assume that the input is validly
encoded. If the input ends with a garbage character that looks like
the start of a multi-byte character, we can't assume that there's
enough space in the output buffer to store the required number of
padding spaces.

To take an extreme example, suppose there were an encoding where any
time the first byte of a multi-byte character has the high-bit set,
the character is 100 bytes long. Then suppose someone call
PQescapeStringConn(), or this new function we're adding, with a length
argument of 1, and the first byte of the input buffer has the high-bit
set. The caller is only required to provide a 3-byte output buffer,
and the third byte is needed for the terminating NUL. That means that
after we copy that first character we only have room to insert one
padding space. The way you had it coded, since we were expecting a
character 100 bytes long, we'd always try to insert 99 padding spaces.

>> Let me take a crack at this and post a patch.  We're making this
>> harder than it needs to be.
>
> sure, please.

Working on it...

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Pavel Stehule on
2010/1/18 Robert Haas <robertmhaas(a)gmail.com>:
> On Mon, Jan 18, 2010 at 1:52 PM, Pavel Stehule <pavel.stehule(a)gmail.com> wrote:
>> 2010/1/18 Robert Haas <robertmhaas(a)gmail.com>:
>>> On Sun, Jan 17, 2010 at 2:04 PM, Pavel Stehule <pavel.stehule(a)gmail.com> wrote:
>>>> I rewrote patch so now interface for PQescapeIdentConn is same as
>>>> PQescapeStringConn
>>>>
>>>> @3. I though so the protection under incomplete multibyte chars are
>>>> enought - missing bytes are replaced by space - like
>>>> PQescapeStringConn does.
>>>
>>> That much is fine, but the output buffer is only guaranteed to be of
>>> size 2n+1.  Imagine the input is two double-quotes followed by a byte
>>> for which pg_encoding_mblen() returns 4.  The input is 3 characters
>>> long so the user was responsible to provide 7 bytes of output space,
>>> but you'll try to write 9 bytes to it (including the terminating NUL).
>>>
>> I don't understand. The "length" is number of bytes, not number of
>> chars. It is maybe bad documented only. If your input string has 6
>> bytes, then buffer have to allocated to 13 bytes. Nobody knows how
>> much is chars there.
>
> Right, but the point is we can't assume that the input is validly
> encoded.  If the input ends with a garbage character that looks like
> the start of a multi-byte character, we can't assume that there's
> enough space in the output buffer to store the required number of
> padding spaces.
>
> To take an extreme example, suppose there were an encoding where any
> time the first byte of a multi-byte character has the high-bit set,
> the character is 100 bytes long.  Then suppose someone call
> PQescapeStringConn(), or this new function we're adding, with a length
> argument of 1, and the first byte of the input buffer has the high-bit
> set.  The caller is only required to provide a 3-byte output buffer,
> and the third byte is needed for the terminating NUL.  That means that
> after we copy that first character we only have room to insert one
> padding space.  The way you had it coded, since we were expecting a
> character 100 bytes long, we'd always try to insert 99 padding spaces.
>

do you speak about previous version?

in current version is garanted new length is <= 2x original length

Pavel

>
>> Let me take a crack at this and post a patch.  We're making this
>>> harder than it needs to be.
>>
>> sure, please.
>
> Working on it...
>
> ...Robert
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers