From: Andrew McNamara on 7 Feb 2010 20:25 >I added you into the list at http://wiki.postgresql.org/wiki/Python Thanks. >Can you check what I put in there, confirm Windows compatibility, and >comment on Python 3.X support? I haven't tried it under Windows and I haven't had any feedback either way from Windows users. For now, ocpgdb has no Python 3 support (I don't foresee any real problems, however). >I'd be curious to hear more about the escaping bugs you ran into as well. >We already have some notes on the TODO that pushing more of this work >toward the standard libpq routines would seem appropriate for things >passing between the driver and libpq. Were the issues you ran into on >that side, or more on the Python side of how things were being formatted? It was a while ago now and I can't remember the specific details - it was more a general feeling that the existing offerings were going about it the wrong way (with respect to parameter passing and escaping). I suspect this was a historical artifact (presumably libpq didn't provide escaping facilities or parameterised queries when the adapters were written). Essentially, I just wanted a pyPgSQL with a more modern implementation. Psycopg was (is?) also using Protocol 2. I felt that the way forward was to switch to the Protocol 3 API features, in particular, parameterised queries, and none of the existing Python adapters had done that (I got the impression while writing my module that nobody was exercising the new features). -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Smith on 7 Feb 2010 15:23 Josh Berkus wrote: > Anyway, I don't yet have a full diagnosis on the transaction control > issue or I'd already have posted it to psycopg -- it may be a toxic > interaction between Django and Psycopg2 rather than psycopg2 alone. I'd > not have brought it up except for this discussion. > I'm going to remove it from the list on the wiki then for now. I don't want to annoy the developers by adding a more speculative bug that might not even be in their software. If you get to where it's confirmed and info posted to their list, please add a link back into the page once it's reported, i.e. link to their mailing list archives or something like that. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg(a)2ndQuadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Smith on 8 Feb 2010 03:36 Massa, Harald Armin wrote: > I agree that there are some performance-challenges with pure-Python > drivers. > And we should not forget to look for the reasons for the incubation of > that many pure-Python drivers: > a) Python is no longer one-language, one-implementation. There are (at > least) cPython (the original), Jython (on JVM), IronPython (from > Microsoft on CLR), PyPy (Python on Python), Unladen Swallow (from > Google on LLVM). In addition the nearly-Pythons as in Cython, RPython > and ShedSkin... > especially a) is a point to consider when standard, it's getting one > driver that satisfies the needs of the people most like > izing on a PostgreSQL blessed Python-Postgresql-driver. How will the > blessing extend to Jython / Ironpython / PyPy? The point isn't so much "standardizing". Having a low performance Python driver turns into a PostgreSQL PR issue. Last thing we need is the old "PostgreSQL is slow" meme to crop back up again via the Python community, if the driver suggested by the community isn't written with performance as a goal so that, say, PostgreSQL+Python looks really slow compared to MySQL+Python. And if you're writing a database driver with performance as a goal, native Python is simply not an option. Now, once *that* problem is under control, and there's a nicely licensed, well documented, major feature complete, and good performing driver, at that point it would be completely appropriate to ask "what about people who want support for other Python platforms and don't care if it's slower?". And as you say, nurturing the "incubation" of such drivers is completely worthwhile. I just fear that losing focus by wandering too far in that direction, before resolving the main problem here, is just going to extend resolving the parts of the Python driver situation I feel people are most displeased with. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg(a)2ndQuadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Andrew McNamara on 8 Feb 2010 20:51 >On Tue, 2010-02-09 at 10:46 +1100, Andrew McNamara wrote: >> The problem is deeper than that - when query parameters use the binary >> option, the server has no way to decode the binary parameter without an >> appropriate type OID. > >Postgres does not attempt to decode anything (text or binary format) >until it figures out what type it is. How does it figure out what type it is? Either by the type oid passed by the caller, or by the context if the type oid is "unknown". Now, with the text format parameters, the parser usually does the right thing, since text formats have plenty of hints for us humans. However, with the binary format, unless the caller tells us, there's no way to tell whether we're correctly parsing the data. If the context implies one type, but the user passes another, we'll either get an ugly error or, worse, silently misparse their data. Generally this isn't a big problem with python, as we have good type information available. It's only an issue because people have gotten used to the text parameter parsing being so forgiving. Using my ocpgdb module, and interacting directly with the libpq wrapping code, you can see how postgres reacts to various inputs: >>> from oclibpq import * >>> from ocpgdb import pgoid >>> db=PgConnection('') No parameters: >>> r=db.execute('select 1', ()) >>> r.status PGRES_TUPLES_OK >>> list(r) [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x01' at 0xb7514200>,)] Int4 parameter, type specified: >>> r=db.execute('select $1', [(pgoid.int4, '\x00\x00\x00\x01')]) >>> r.status PGRES_TUPLES_OK >>> list(r) [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x01' at 0xb75141c0>,)] Int4 parameter, type unknown, can't be determined from context: >>> r=db.execute('select $1', [(pgoid.unknown, '\x00\x00\x00\x01')]) >>> r.status PGRES_FATAL_ERROR >>> r.errorMessage 'ERROR: could not determine data type of parameter $1\n' Int4 parameter, type unknown, can be determined from context: >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '\x00\x00\x00\x01')]) >>> r.status PGRES_TUPLES_OK >>> list(r) [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x02' at 0xb7514200>,)] Text parameter, type unknown, mismatching context - surprising: >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '1111')]) >>> r.status PGRES_TUPLES_OK >>> list(r) [(<PyPgCell name '?column?', type 23, modifier -1, value '1112' at 0xb7514360>,)] Date parameter, type unknown, int context, the value gets misinterpreted: >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '\x00\x00\x01n')]) >>> r.status PGRES_TUPLES_OK >>> list(r) [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x01o' at 0xb75144a0>,)] -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Jeff Davis on 8 Feb 2010 21:38
On Tue, 2010-02-09 at 12:51 +1100, Andrew McNamara wrote: > Now, with the text format parameters, the parser usually does the right > thing, since text formats have plenty of hints for us humans. The parser doesn't care whether it's text format or binary format when determining the type. > However, with the binary format, unless the caller tells us, there's no way > to tell whether we're correctly parsing the data. If the context implies > one type, but the user passes another, we'll either get an ugly error or, > worse, silently misparse their data. The difference between text and binary format is this: after it has already determined the type of the parameter, (a) if the format is text, it passes it to the type input function to construct the value; or (b) if the format is binary, it passes it to the type recv function to construct the value. The argument to the input or recv functions may: (a) be valid input; or (b) be invalid input, and be detected as an error by the input or recv function; or (c) be invalid input, and not be detected as an error by the input or recv function. For a given type, the input function may be more likely to catch an input error than the recv function; or the reverse. Either way, it is very type-specific, and the only difference is the whether the input is misinterpreted (type error not caught; bad) or an error is thrown (type error caught; better). > Using my ocpgdb module, and interacting directly with the libpq wrapping > code, you can see how postgres reacts to various inputs: None of the examples show a difference in the inferred type of a text versus binary parameter for the same query. > No parameters: > > >>> r=db.execute('select 1', ()) > >>> r.status > PGRES_TUPLES_OK > >>> list(r) > [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x01' at 0xb7514200>,)] Expected, because the literal 1 (without quotes) is an integer literal, not an unknown literal. > Int4 parameter, type specified: > > >>> r=db.execute('select $1', [(pgoid.int4, '\x00\x00\x00\x01')]) > >>> r.status > PGRES_TUPLES_OK > >>> list(r) > [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x01' at 0xb75141c0>,)] Expected, because you specified the type, and sent the binary data to the integer recv function, and it was valid input. > Int4 parameter, type unknown, can't be determined from context: > > >>> r=db.execute('select $1', [(pgoid.unknown, '\x00\x00\x00\x01')]) > >>> r.status > PGRES_FATAL_ERROR > >>> r.errorMessage > 'ERROR: could not determine data type of parameter $1\n' Expected -- there is no context to determine the type. Why do you call it an int4 parameter? It's just bytes, and you never told postgres what they are (as you did in the previous example). > Int4 parameter, type unknown, can be determined from context: > > >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '\x00\x00\x00\x01')]) > >>> r.status > PGRES_TUPLES_OK > >>> list(r) > [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x02' at 0xb7514200>,)] Expected: the function + provides the context that allows the server to interpret the left argument as an integer. (Again, not an int4 parameter, it's unknown) > Text parameter, type unknown, mismatching context - surprising: > > >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '1111')]) > >>> r.status > PGRES_TUPLES_OK > >>> list(r) > [(<PyPgCell name '?column?', type 23, modifier -1, value '1112' at 0xb7514360>,)] Expected, because this is exactly the same as the previous one except for the data you pass in. Notice that the same type is inferred (23). Why do you call this "mismatching context" when the context is exactly the same as above? The only difference is which 4 bytes you provide. You never told postgres that the bytes were text bytes anywhere. You may think that it's doing 1111 + 1, but it's actually doing addition on the bytes. That is apparent in the next example: > Date parameter, type unknown, int context, the value gets misinterpreted: > > >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '\x00\x00\x01n')]) > >>> r.status > PGRES_TUPLES_OK > >>> list(r) > [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x01o' at 0xb75144a0>,)] > Expected, because the only thing that could possibly detect the error is the int4recv function, which happens to accept any 4-byte input (so it will never fail on any 4 bytes of data). Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |