Prev: Port for Microsoft Services for Unix (SFU) or SUA
Next: [HACKERS] solution to make static changes in pg_hba.conf file?
From: Pavel Stehule on 19 Jan 2010 14:03 2010/1/19 Robert Haas <robertmhaas(a)gmail.com>: > On Tue, Jan 19, 2010 at 1:43 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas(a)gmail.com> writes: >>> Updated patch attached. Â I still think this is a bizarre API. >> >> Well, if we had it to do over I'm sure we'd have done it differently, >> but the functions are there now and we aren't going to change them ... > > I agree, but I don't feel the need to repeat the mistakes we've > already made by slavishly copying them into new places. Â I think as > long as we're adding a new function, we should make it behave sanely. > We could even take the opportunity to go back and add a saner version > of PQescapeStringConn. Sometimes is better do nothing :(. I understand you, my first code was same as your, but any change of basic API is the terrible thing. This function is used in PHP driver, and bilions web pages. Theoretically we can work on new libpq with richer API - there could be better support of binary form, smarter quoting functions. But actually there are no main request. Regards Pavel > > ...Robert > -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 19 Jan 2010 14:24 On Tue, Jan 19, 2010 at 2:01 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas(a)gmail.com> writes: >> ... I think as >> long as we're adding a new function, we should make it behave sanely. >> We could even take the opportunity to go back and add a saner version >> of PQescapeStringConn. > > Well, it's a bit late in the devel cycle to be inventing from scratch, > but if we did want to do something "saner" what would it look like? > I can see the following possibilities: > > * include boundary quotes (and E too in the literal case). This would > imply telling people they should leave whitespace around the value in > the constructed query ... or should we insert leading/trailing spaces > to prevent that mistake too? Does the existing PQescapeStringConn() require E'' rather than just ''? It isn't documented so. I think inserting the whitespace would be overkill, but that's another thing that's not mentioned in the present documentation and should be, because it would not be difficult to screw it up. > * malloc our own result value instead of expecting caller to provide > space. This adds another failure case (out of memory) but it's not > really much different from OOM on the caller side. I think that would be an anti-improvement - caller might have their own malloc substitute, might want to stack-allocate, etc. Plus we'd either have to malloc the worst-case buffer size or else read the string twice. But see below... > * anything else you want to change? Instead of relying on the output buffer to be exactly 2n+1 or 2n+3 or whatever, I suggest that we add an explicit argument for the size of the output buffer. If the buffer the caller provides is too short, ideally we should let the caller know about the error and also indicate how much space would have been necessary (like sprintf). > I suggest we could use PQescapeLiteral + PQescapeIdentifier as names > if we provide a new pair of functions defined with a different API. I like those names. Maybe something like: size_t PQescapeLiteral(PGconn *conn, char *to, size_t tolen, char *from, size_t fromlen, int *error); size_t PQescapeIdentifier(PGconn *conn, char *to, size_t tolen, char *from, size_t fromlen, int *error); On success, *error is 0 and size_t is the number of bytes actually written. On failure, *error is non-zero, conn->errorMessage is an appropriate error, and the return value is the shortest value of tolen adequate to hold the result, or 0 if we bombed out for some reason other than lack of output space. That way, if people are willing to incur the overhead of an extra call, they can malloc() (or whatever) exactly the right amount of space. Or they can just size the buffer according to the documentation and then they're safe. Another option would be to swap the return value and *error: int PQescapeLiteral(PGconn *conn, char *to, int tolen, char *from, int fromlen, size_t *needed); int PQescapeIdentifier(PGconn *conn, char *to, int tolen, char *from, int fromlen, size_t *needed); ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 19 Jan 2010 14:38 Robert Haas <robertmhaas(a)gmail.com> writes: > On Tue, Jan 19, 2010 at 2:01 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: >> * include boundary quotes (and E too in the literal case). �This would >> imply telling people they should leave whitespace around the value in >> the constructed query ... or should we insert leading/trailing spaces >> to prevent that mistake too? > Does the existing PQescapeStringConn() require E'' rather than just > ''? It isn't documented so. It does not expect E'', and would get it wrong if you supplied E'', which is an anti-feature because it makes it impossible to avoid escape_string_warning messages. If we were to supply quotes I think we should also supply E whenever we need to use a backslash. > I think inserting the whitespace would be overkill, but that's another > thing that's not mentioned in the present documentation and should be, > because it would not be difficult to screw it up. As long as it's documented it's okay ... probably ... note that conditionally inserting E would get us right back to the place where an unsafe usage might appear to work fine in light testing. Maybe prepend a space only if prepending E? >> * malloc our own result value instead of expecting caller to provide >> space. �This adds another failure case (out of memory) but it's not >> really much different from OOM on the caller side. > I think that would be an anti-improvement - caller might have their > own malloc substitute, might want to stack-allocate, etc. Plus we'd > either have to malloc the worst-case buffer size or else read the > string twice. But see below... We already have multiple functions in the API that do malloc(). I think that requiring the caller to estimate the size of the space is a bigger anti-feature than using malloc. > I like those names. Maybe something like: > size_t PQescapeLiteral(PGconn *conn, char *to, size_t tolen, char > *from, size_t fromlen, int *error); > size_t PQescapeIdentifier(PGconn *conn, char *to, size_t tolen, char > *from, size_t fromlen, int *error); This would get a *lot* simpler if we malloced the result space. char *PQescapeXXX(PQconn *conn, const char *str, size_t len) with NULL result used to signal any failure (and a message left in the conn). > ... Or they can just size the buffer > according to the documentation and then they're safe. Wrong, one thing we would absolutely not do is document any hard guarantee about the maximum space needed. That's isomorphic to documenting the exact escaping algorithm, and doing that has already bit us more times than you know. If we're going to alter the API at all I think we absolutely *must* get rid of that design error. Keep in mind that a large part of wanting to have these functions at all is for simplicity in the calling code. Complicating their API for marginal gains doesn't seem to me to be the right design direction. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 19 Jan 2010 15:01 On Tue, Jan 19, 2010 at 2:38 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: > As long as it's documented it's okay ... probably ... note that > conditionally inserting E would get us right back to the place where > an unsafe usage might appear to work fine in light testing. Maybe > prepend a space only if prepending E? That's a bit of a kludge, but maybe it's liveable. >>> * malloc our own result value instead of expecting caller to provide >>> space. This adds another failure case (out of memory) but it's not >>> really much different from OOM on the caller side. > >> I think that would be an anti-improvement - caller might have their >> own malloc substitute, might want to stack-allocate, etc. Plus we'd >> either have to malloc the worst-case buffer size or else read the >> string twice. But see below... > > We already have multiple functions in the API that do malloc(). I think > that requiring the caller to estimate the size of the space is a bigger > anti-feature than using malloc. > >> I like those names. Maybe something like: > >> size_t PQescapeLiteral(PGconn *conn, char *to, size_t tolen, char >> *from, size_t fromlen, int *error); >> size_t PQescapeIdentifier(PGconn *conn, char *to, size_t tolen, char >> *from, size_t fromlen, int *error); > > This would get a *lot* simpler if we malloced the result space. > > char *PQescapeXXX(PQconn *conn, const char *str, size_t len) > > with NULL result used to signal any failure (and a message left in > the conn). Well, I have to admit that the simplicity of that API is very appealing. Someone will probably eventually request PQescapeXXXButUseMyMalloc(PQConn *conn, const char *str, size_t len, void (*MyMalloc)(size_t)); ... but at least until we change the escaping algorithm again, we can always tell that person to use PQescapeStringConn() and roll their own. So, I'm sold. Do you think we should malloc 2n+X bytes all the time, or do you want to scan the string once to determine how much space is needed and then allocate exactly that much space? It seems to me that it might be sensible to do it this way: 1. Do a locale-aware scan of the input string and count the number of characters we need to escape (num_to_escape). 2. If num_to_escape is 0, fast path: allocate len+3 bytes and use memcpy to copy the input data to the output buffer. 3. Otherwise, allocate len+num_to_escape+5 bytes (space-E-quote-quote-NUL) and do a second locale-aware scan of the input string, copying and escaping as we go (or do you only want the space-E if the escapable character that we find is \ rather than '?). ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Tom Lane on 19 Jan 2010 15:13
Robert Haas <robertmhaas(a)gmail.com> writes: > Do you think we should malloc 2n+X bytes all the time, or do you want > to scan the string once to determine how much space is needed and then > allocate exactly that much space? I'd vote for two scans, as I think we'll soon decide 2n doesn't cut it anyway. One of the issues that needs to be looked at is embedded null characters. We might fail on that for the moment, but I can foresee wanting to send \000 instead. You don't want to pay for 4x do you? > It seems to me that it might be > sensible to do it this way: > 1. Do a locale-aware scan of the input string and count the number of > characters we need to escape (num_to_escape). > 2. If num_to_escape is 0, fast path: allocate len+3 bytes and use > memcpy to copy the input data to the output buffer. > 3. Otherwise, allocate len+num_to_escape+5 bytes > (space-E-quote-quote-NUL) and do a second locale-aware scan of the > input string, copying and escaping as we go (or do you only want the > space-E if the escapable character that we find is \ rather than '?). Yeah, the fast path if no escaping is needed seems attractive. I think that E should only be inserted if we have a backslash to deal with; no reason to generate nonstandard syntax if we don't have to. That would mean keeping two bits of state (escaping-needed and malloc-size) from the initial pass, but that's pretty cheap. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |