Confusion over Python drivers [PgSql]

Prev: Writeable CTEs patch
Next: WIP patch for system-catalog vacuuming via a relationmap

From: Andrew McNamara on 7 Feb 2010 20:25

>I added you into the list at http://wiki.postgresql.org/wiki/Python

Thanks.

>Can you check what I put in there, confirm Windows compatibility, and
>comment on Python 3.X support?

I haven't tried it under Windows and I haven't had any feedback either
way from Windows users.

For now, ocpgdb has no Python 3 support (I don't foresee any real
problems, however).

>I'd be curious to hear more about the escaping bugs you ran into as well.
>We already have some notes on the TODO that pushing more of this work
>toward the standard libpq routines would seem appropriate for things
>passing between the driver and libpq. Were the issues you ran into on
>that side, or more on the Python side of how things were being formatted?

It was a while ago now and I can't remember the specific details - it was
more a general feeling that the existing offerings were going about it
the wrong way (with respect to parameter passing and escaping). I suspect
this was a historical artifact (presumably libpq didn't provide escaping
facilities or parameterised queries when the adapters were written).

Essentially, I just wanted a pyPgSQL with a more modern implementation.
Psycopg was (is?) also using Protocol 2. I felt that the way forward was
to switch to the Protocol 3 API features, in particular, parameterised
queries, and none of the existing Python adapters had done that (I got
the impression while writing my module that nobody was exercising the
new features).

--
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on 7 Feb 2010 15:23

Josh Berkus wrote:
> Anyway, I don't yet have a full diagnosis on the transaction control
> issue or I'd already have posted it to psycopg -- it may be a toxic
> interaction between Django and Psycopg2 rather than psycopg2 alone. I'd
> not have brought it up except for this discussion.
>

I'm going to remove it from the list on the wiki then for now. I don't
want to annoy the developers by adding a more speculative bug that might
not even be in their software. If you get to where it's confirmed and
info posted to their list, please add a link back into the page once
it's reported, i.e. link to their mailing list archives or something
like that.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.us

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on 8 Feb 2010 03:36

Massa, Harald Armin wrote:
> I agree that there are some performance-challenges with pure-Python
> drivers.
> And we should not forget to look for the reasons for the incubation of
> that many pure-Python drivers:
> a) Python is no longer one-language, one-implementation. There are (at
> least) cPython (the original), Jython (on JVM), IronPython (from
> Microsoft on CLR), PyPy (Python on Python), Unladen Swallow (from
> Google on LLVM). In addition the nearly-Pythons as in Cython, RPython
> and ShedSkin...
> especially a) is a point to consider when standard, it's getting one
> driver that satisfies the needs of the people most like
> izing on a PostgreSQL blessed Python-Postgresql-driver. How will the
> blessing extend to Jython / Ironpython / PyPy?

The point isn't so much "standardizing". Having a low performance
Python driver turns into a PostgreSQL PR issue. Last thing we need is
the old "PostgreSQL is slow" meme to crop back up again via the Python
community, if the driver suggested by the community isn't written with
performance as a goal so that, say, PostgreSQL+Python looks really slow
compared to MySQL+Python. And if you're writing a database driver with
performance as a goal, native Python is simply not an option.

Now, once *that* problem is under control, and there's a nicely
licensed, well documented, major feature complete, and good performing
driver, at that point it would be completely appropriate to ask "what
about people who want support for other Python platforms and don't care
if it's slower?". And as you say, nurturing the "incubation" of such
drivers is completely worthwhile. I just fear that losing focus by
wandering too far in that direction, before resolving the main problem
here, is just going to extend resolving the parts of the Python driver
situation I feel people are most displeased with.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Andrew McNamara on 8 Feb 2010 20:51

>On Tue, 2010-02-09 at 10:46 +1100, Andrew McNamara wrote:
>> The problem is deeper than that - when query parameters use the binary
>> option, the server has no way to decode the binary parameter without an
>> appropriate type OID.
>
>Postgres does not attempt to decode anything (text or binary format)
>until it figures out what type it is.

How does it figure out what type it is? Either by the type oid passed by
the caller, or by the context if the type oid is "unknown".

Now, with the text format parameters, the parser usually does the right
thing, since text formats have plenty of hints for us humans.

However, with the binary format, unless the caller tells us, there's no way
to tell whether we're correctly parsing the data. If the context implies
one type, but the user passes another, we'll either get an ugly error or,
worse, silently misparse their data.

Generally this isn't a big problem with python, as we have good type
information available. It's only an issue because people have gotten
used to the text parameter parsing being so forgiving.

Using my ocpgdb module, and interacting directly with the libpq wrapping
code, you can see how postgres reacts to various inputs:

>>> from oclibpq import *
>>> from ocpgdb import pgoid
>>> db=PgConnection('')

No parameters:

>>> r=db.execute('select 1', ())
>>> r.status
PGRES_TUPLES_OK
>>> list(r)
[(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x01' at 0xb7514200>,)]

Int4 parameter, type specified:

>>> r=db.execute('select $1', [(pgoid.int4, '\x00\x00\x00\x01')])
>>> r.status
PGRES_TUPLES_OK
>>> list(r)
[(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x01' at 0xb75141c0>,)]

Int4 parameter, type unknown, can't be determined from context:

>>> r=db.execute('select $1', [(pgoid.unknown, '\x00\x00\x00\x01')])
>>> r.status
PGRES_FATAL_ERROR
>>> r.errorMessage
'ERROR: could not determine data type of parameter $1\n'

Int4 parameter, type unknown, can be determined from context:

>>> r=db.execute('select $1 + 1', [(pgoid.unknown, '\x00\x00\x00\x01')])
>>> r.status
PGRES_TUPLES_OK
>>> list(r)
[(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x02' at 0xb7514200>,)]

Text parameter, type unknown, mismatching context - surprising:

>>> r=db.execute('select $1 + 1', [(pgoid.unknown, '1111')])
>>> r.status
PGRES_TUPLES_OK
>>> list(r)
[(<PyPgCell name '?column?', type 23, modifier -1, value '1112' at 0xb7514360>,)]

Date parameter, type unknown, int context, the value gets misinterpreted:

>>> r=db.execute('select $1 + 1', [(pgoid.unknown, '\x00\x00\x01n')])
>>> r.status
PGRES_TUPLES_OK
>>> list(r)
[(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x01o' at 0xb75144a0>,)]

--
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jeff Davis on 8 Feb 2010 21:38

On Tue, 2010-02-09 at 12:51 +1100, Andrew McNamara wrote:
> Now, with the text format parameters, the parser usually does the right
> thing, since text formats have plenty of hints for us humans.

The parser doesn't care whether it's text format or binary format when
determining the type.

> However, with the binary format, unless the caller tells us, there's no way
> to tell whether we're correctly parsing the data. If the context implies
> one type, but the user passes another, we'll either get an ugly error or,
> worse, silently misparse their data.

The difference between text and binary format is this: after it has
already determined the type of the parameter,
(a) if the format is text, it passes it to the type input function
to construct the value; or
(b) if the format is binary, it passes it to the type recv function
to construct the value.

The argument to the input or recv functions may:
(a) be valid input; or
(b) be invalid input, and be detected as an error by the input or
recv function; or
(c) be invalid input, and not be detected as an error by the input
or recv function.

For a given type, the input function may be more likely to catch an
input error than the recv function; or the reverse. Either way, it is
very type-specific, and the only difference is the whether the input is
misinterpreted (type error not caught; bad) or an error is thrown (type
error caught; better).

> Using my ocpgdb module, and interacting directly with the libpq wrapping
> code, you can see how postgres reacts to various inputs:

None of the examples show a difference in the inferred type of a text
versus binary parameter for the same query.

> No parameters:
>
> >>> r=db.execute('select 1', ())
> >>> r.status
> PGRES_TUPLES_OK
> >>> list(r)
> [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x01' at 0xb7514200>,)]

Expected, because the literal 1 (without quotes) is an integer literal,
not an unknown literal.

> Int4 parameter, type specified:
>
> >>> r=db.execute('select $1', [(pgoid.int4, '\x00\x00\x00\x01')])
> >>> r.status
> PGRES_TUPLES_OK
> >>> list(r)
> [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x01' at 0xb75141c0>,)]

Expected, because you specified the type, and sent the binary data to
the integer recv function, and it was valid input.

> Int4 parameter, type unknown, can't be determined from context:
>
> >>> r=db.execute('select $1', [(pgoid.unknown, '\x00\x00\x00\x01')])
> >>> r.status
> PGRES_FATAL_ERROR
> >>> r.errorMessage
> 'ERROR: could not determine data type of parameter $1\n'

Expected -- there is no context to determine the type.

Why do you call it an int4 parameter? It's just bytes, and you never
told postgres what they are (as you did in the previous example).

> Int4 parameter, type unknown, can be determined from context:
>
> >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '\x00\x00\x00\x01')])
> >>> r.status
> PGRES_TUPLES_OK
> >>> list(r)
> [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x00\x02' at 0xb7514200>,)]

Expected: the function + provides the context that allows the server to
interpret the left argument as an integer.

(Again, not an int4 parameter, it's unknown)

> Text parameter, type unknown, mismatching context - surprising:
>
> >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '1111')])
> >>> r.status
> PGRES_TUPLES_OK
> >>> list(r)
> [(<PyPgCell name '?column?', type 23, modifier -1, value '1112' at 0xb7514360>,)]

Expected, because this is exactly the same as the previous one except
for the data you pass in. Notice that the same type is inferred (23).

Why do you call this "mismatching context" when the context is exactly
the same as above? The only difference is which 4 bytes you provide. You
never told postgres that the bytes were text bytes anywhere.

You may think that it's doing 1111 + 1, but it's actually doing addition
on the bytes. That is apparent in the next example:

> Date parameter, type unknown, int context, the value gets misinterpreted:
>
> >>> r=db.execute('select $1 + 1', [(pgoid.unknown, '\x00\x00\x01n')])
> >>> r.status
> PGRES_TUPLES_OK
> >>> list(r)
> [(<PyPgCell name '?column?', type 23, modifier -1, value '\x00\x00\x01o' at 0xb75144a0>,)]
>

Expected, because the only thing that could possibly detect the error is
the int4recv function, which happens to accept any 4-byte input (so it
will never fail on any 4 bytes of data).

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Prev: Writeable CTEs patch
Next: WIP patch for system-catalog vacuuming via a relationmap