From: Pavel Stehule on
2010/1/19 Robert Haas <robertmhaas(a)gmail.com>:
> On Tue, Jan 19, 2010 at 1:43 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas(a)gmail.com> writes:
>>> Updated patch attached.  I still think this is a bizarre API.
>>
>> Well, if we had it to do over I'm sure we'd have done it differently,
>> but the functions are there now and we aren't going to change them ...
>
> I agree, but I don't feel the need to repeat the mistakes we've
> already made by slavishly copying them into new places.  I think as
> long as we're adding a new function, we should make it behave sanely.
> We could even take the opportunity to go back and add a saner version
> of PQescapeStringConn.

Sometimes is better do nothing :(. I understand you, my first code was
same as your, but any change of basic API is the terrible thing. This
function is used in PHP driver, and bilions web pages. Theoretically
we can work on new libpq with richer API - there could be better
support of binary form, smarter quoting functions. But actually there
are no main request.

Regards
Pavel



>
> ...Robert
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Tue, Jan 19, 2010 at 2:01 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas(a)gmail.com> writes:
>> ... I think as
>> long as we're adding a new function, we should make it behave sanely.
>> We could even take the opportunity to go back and add a saner version
>> of PQescapeStringConn.
>
> Well, it's a bit late in the devel cycle to be inventing from scratch,
> but if we did want to do something "saner" what would it look like?
> I can see the following possibilities:
>
> * include boundary quotes (and E too in the literal case).  This would
> imply telling people they should leave whitespace around the value in
> the constructed query ... or should we insert leading/trailing spaces
> to prevent that mistake too?

Does the existing PQescapeStringConn() require E'' rather than just
''? It isn't documented so.

I think inserting the whitespace would be overkill, but that's another
thing that's not mentioned in the present documentation and should be,
because it would not be difficult to screw it up.

> * malloc our own result value instead of expecting caller to provide
> space.  This adds another failure case (out of memory) but it's not
> really much different from OOM on the caller side.

I think that would be an anti-improvement - caller might have their
own malloc substitute, might want to stack-allocate, etc. Plus we'd
either have to malloc the worst-case buffer size or else read the
string twice. But see below...

> * anything else you want to change?

Instead of relying on the output buffer to be exactly 2n+1 or 2n+3 or
whatever, I suggest that we add an explicit argument for the size of
the output buffer. If the buffer the caller provides is too short,
ideally we should let the caller know about the error and also
indicate how much space would have been necessary (like sprintf).

> I suggest we could use PQescapeLiteral + PQescapeIdentifier as names
> if we provide a new pair of functions defined with a different API.

I like those names. Maybe something like:

size_t PQescapeLiteral(PGconn *conn, char *to, size_t tolen, char
*from, size_t fromlen, int *error);
size_t PQescapeIdentifier(PGconn *conn, char *to, size_t tolen, char
*from, size_t fromlen, int *error);

On success, *error is 0 and size_t is the number of bytes actually
written. On failure, *error is non-zero, conn->errorMessage is an
appropriate error, and the return value is the shortest value of tolen
adequate to hold the result, or 0 if we bombed out for some reason
other than lack of output space. That way, if people are willing to
incur the overhead of an extra call, they can malloc() (or whatever)
exactly the right amount of space. Or they can just size the buffer
according to the documentation and then they're safe.

Another option would be to swap the return value and *error:

int PQescapeLiteral(PGconn *conn, char *to, int tolen, char *from, int
fromlen, size_t *needed);
int PQescapeIdentifier(PGconn *conn, char *to, int tolen, char *from,
int fromlen, size_t *needed);

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Robert Haas <robertmhaas(a)gmail.com> writes:
> On Tue, Jan 19, 2010 at 2:01 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> * include boundary quotes (and E too in the literal case). �This would
>> imply telling people they should leave whitespace around the value in
>> the constructed query ... or should we insert leading/trailing spaces
>> to prevent that mistake too?

> Does the existing PQescapeStringConn() require E'' rather than just
> ''? It isn't documented so.

It does not expect E'', and would get it wrong if you supplied E'',
which is an anti-feature because it makes it impossible to avoid
escape_string_warning messages. If we were to supply quotes I think
we should also supply E whenever we need to use a backslash.

> I think inserting the whitespace would be overkill, but that's another
> thing that's not mentioned in the present documentation and should be,
> because it would not be difficult to screw it up.

As long as it's documented it's okay ... probably ... note that
conditionally inserting E would get us right back to the place where
an unsafe usage might appear to work fine in light testing. Maybe
prepend a space only if prepending E?

>> * malloc our own result value instead of expecting caller to provide
>> space. �This adds another failure case (out of memory) but it's not
>> really much different from OOM on the caller side.

> I think that would be an anti-improvement - caller might have their
> own malloc substitute, might want to stack-allocate, etc. Plus we'd
> either have to malloc the worst-case buffer size or else read the
> string twice. But see below...

We already have multiple functions in the API that do malloc(). I think
that requiring the caller to estimate the size of the space is a bigger
anti-feature than using malloc.

> I like those names. Maybe something like:

> size_t PQescapeLiteral(PGconn *conn, char *to, size_t tolen, char
> *from, size_t fromlen, int *error);
> size_t PQescapeIdentifier(PGconn *conn, char *to, size_t tolen, char
> *from, size_t fromlen, int *error);

This would get a *lot* simpler if we malloced the result space.

char *PQescapeXXX(PQconn *conn, const char *str, size_t len)

with NULL result used to signal any failure (and a message left in
the conn).

> ... Or they can just size the buffer
> according to the documentation and then they're safe.

Wrong, one thing we would absolutely not do is document any hard
guarantee about the maximum space needed. That's isomorphic to
documenting the exact escaping algorithm, and doing that has already
bit us more times than you know. If we're going to alter the API at
all I think we absolutely *must* get rid of that design error.

Keep in mind that a large part of wanting to have these functions at all
is for simplicity in the calling code. Complicating their API for
marginal gains doesn't seem to me to be the right design direction.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Tue, Jan 19, 2010 at 2:38 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> As long as it's documented it's okay ... probably ... note that
> conditionally inserting E would get us right back to the place where
> an unsafe usage might appear to work fine in light testing.  Maybe
> prepend a space only if prepending E?

That's a bit of a kludge, but maybe it's liveable.

>>> * malloc our own result value instead of expecting caller to provide
>>> space.  This adds another failure case (out of memory) but it's not
>>> really much different from OOM on the caller side.
>
>> I think that would be an anti-improvement - caller might have their
>> own malloc substitute, might want to stack-allocate, etc.  Plus we'd
>> either have to malloc the worst-case buffer size or else read the
>> string twice.  But see below...
>
> We already have multiple functions in the API that do malloc().  I think
> that requiring the caller to estimate the size of the space is a bigger
> anti-feature than using malloc.
>
>> I like those names.  Maybe something like:
>
>> size_t PQescapeLiteral(PGconn *conn, char *to, size_t tolen, char
>> *from, size_t fromlen, int *error);
>> size_t PQescapeIdentifier(PGconn *conn, char *to, size_t tolen, char
>> *from, size_t fromlen, int *error);
>
> This would get a *lot* simpler if we malloced the result space.
>
>        char *PQescapeXXX(PQconn *conn, const char *str, size_t len)
>
> with NULL result used to signal any failure (and a message left in
> the conn).

Well, I have to admit that the simplicity of that API is very
appealing. Someone will probably eventually request
PQescapeXXXButUseMyMalloc(PQConn *conn, const char *str, size_t len,
void (*MyMalloc)(size_t)); ... but at least until we change the
escaping algorithm again, we can always tell that person to use
PQescapeStringConn() and roll their own. So, I'm sold.

Do you think we should malloc 2n+X bytes all the time, or do you want
to scan the string once to determine how much space is needed and then
allocate exactly that much space? It seems to me that it might be
sensible to do it this way:

1. Do a locale-aware scan of the input string and count the number of
characters we need to escape (num_to_escape).
2. If num_to_escape is 0, fast path: allocate len+3 bytes and use
memcpy to copy the input data to the output buffer.
3. Otherwise, allocate len+num_to_escape+5 bytes
(space-E-quote-quote-NUL) and do a second locale-aware scan of the
input string, copying and escaping as we go (or do you only want the
space-E if the escapable character that we find is \ rather than '?).

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Robert Haas <robertmhaas(a)gmail.com> writes:
> Do you think we should malloc 2n+X bytes all the time, or do you want
> to scan the string once to determine how much space is needed and then
> allocate exactly that much space?

I'd vote for two scans, as I think we'll soon decide 2n doesn't cut it
anyway. One of the issues that needs to be looked at is embedded null
characters. We might fail on that for the moment, but I can foresee
wanting to send \000 instead. You don't want to pay for 4x do you?

> It seems to me that it might be
> sensible to do it this way:

> 1. Do a locale-aware scan of the input string and count the number of
> characters we need to escape (num_to_escape).
> 2. If num_to_escape is 0, fast path: allocate len+3 bytes and use
> memcpy to copy the input data to the output buffer.
> 3. Otherwise, allocate len+num_to_escape+5 bytes
> (space-E-quote-quote-NUL) and do a second locale-aware scan of the
> input string, copying and escaping as we go (or do you only want the
> space-E if the escapable character that we find is \ rather than '?).

Yeah, the fast path if no escaping is needed seems attractive.

I think that E should only be inserted if we have a backslash to deal
with; no reason to generate nonstandard syntax if we don't have to.
That would mean keeping two bits of state (escaping-needed and
malloc-size) from the initial pass, but that's pretty cheap.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers