plpython3 [PgSql]

Prev: [HACKERS] xml2 still essential for us
Next: lock_timeout GUC patch

From: "Joshua D. Drake" on 15 Jan 2010 15:46

On Fri, 2010-01-15 at 13:26 -0700, James William Pye wrote:
> On Jan 14, 2010, at 2:03 PM, Joshua D. Drake wrote:
> > What I would (as a non hacker) would look for is:
> >
> > (1) Generalized benchmarks between plpython(core) and plpython3u
> >
> > I know a lot of these are subjective, but it is still good to see if
> > there are any curves or points that bring the performance of either to
> > light.
>
> I guess I could do some simple function I/O tests to identify invocation overhead(take a single parameter and return it). This should give a somewhat reasonable view of the trade-offs of "native typing" vs conversion performance-wise. One thing to keep in mind is that *three* tests would need to be done per parameter set:
>
> 1. plpython's
> 2. plpython3's (raw data objects/"native typing")
> 3. plpython3's + @pytypes
>
> The third should show degraded performance in comparison to plpythonu's whereas the second should show improvement or near equivalence.
>
> @pytypes is actually implemented in pure-Python, so the impact should be quite visible.
>
> http://python.projects.postgresql.org/pldocs/plpython3-postgres-pytypes.html
>
>
> I'm not sure there's anything else worth measuring. SRFs, maybe?
>
>
> > (2) Example of the traceback facility, I know it is silly but I don't
> > have time to actually download head, apply the patch and test this.
>
> Well, if you ever do find some time, the *easiest* way would probably be to download a branch snapshot from git.pg.org:
>
> http://git.postgresql.org/gitweb?p=plpython3.git;a=snapshot;h=refs/heads/plpython3;sf=tgz
>
> It requires Python 3.1. 3.0 has been abandoned by python.org.
>
> > This
> > type of thing, showing debugging facilities within the function would be
> > killer.
>
> The test output has a *lot* of tracebacks, so I'll just copy and paste one here.
>
> This one shows the traceback output of a chained exception.
>
> -- suffocates a pg error, and attempts to enter a protected area
> CREATE OR REPLACE FUNCTION pg_failure_suf_IFTE() RETURNS VOID LANGUAGE plpython3u AS
> $python$
> import Postgres
>
> rp = Postgres.Type(Postgres.CONST['REGPROCEDUREOID'])
>
> def main():
> try:
> fun = rp('nosuchfunc(int17,zzz)')
> except:
> # Should be valid, but the protection of
> # PL_DB_IN_ERROR should keep it from getting called.
> rp('pg_x_failure_suf()')
> $python$;
>
>
> SELECT pg_failure_suf_IFTE();
> ERROR: database action attempted while in failed transaction
> CONTEXT: [exception from Python]
> Traceback (most recent call last):
> File "public.pg_failure_suf_ifte()", line 8, in main
> fun = rp('nosuchfunc(int17,zzz)')
> Postgres.Exception: type "int17" does not exist
> CODE: 42704
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
> File "public.pg_failure_suf_ifte()", line 12, in main
> rp('pg_x_failure_suf()')
> Postgres.Exception
>
> [public.pg_failure_suf_ifte()]
>
>
> > (3) A distinct real world comparison where the core plpython falls down
> > (if it does) against the plpython3u implementation
>
> Hrm. Are you looking for something that plpython3 can do that plpython can't? Or are you looking for something where plpython makes the user work a lot harder?

I think both apply.

This is great stuff, thank you for taking the effort.

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: James William Pye on 17 Jan 2010 16:06

On Jan 14, 2010, at 7:08 PM, Greg Smith wrote:
> So more targeted examples like you're considering now would help.

So far, I have three specific examples in mind:

The first will illustrate the advantages of function modules wrt setup code in the module body. Primarily this is about convenience. (I'm going to send this example when I send this message)

The second is a generic after trigger that does manipulation logging for some simple replication purposes. This example will illustrate one application of "native typing" as it provides direct access to a PG type's typoutput.

The third one is a fairly old plpythonu example written by Elein that exercises SD to keep state for an aggregate. I'm expecting this to be a good candidate for showing off stateful functions.

Other things I plan to cover, but nothing specific in mind yet:

Direct function calls
Internal subtransactions, "with xact():" (something plpython can't do, save calling plpgsql =)
--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: James William Pye on 17 Jan 2010 16:07

On Jan 14, 2010, at 7:08 PM, Greg Smith wrote:
> So more targeted examples like you're considering now would help.

Here's the first example. This covers an advantage of function modules.

This is a conversion of a plpythonu function published to the wiki:

http://wiki.postgresql.org/wiki/Google_Translate

In the above link, the code is executed in the body of a Python function.
Please see plpython's documentation if you don't understand what I mean by that.

The effect of this is that every time the FUNCTION is called from PG, the import statements are ran, a new class object, UrlOpener, is created, and a new function object, translate, is created. Granted, a minor amount of overhead in this case, but the point is that in order to avoid it the author would have to use SD:

if "urlopener" in SD:
UrlOpener = SD["urlopener"]
else:
class UrlOpener(urllib.UrlOpener):
...
SD["urlopener"] = UrlOpener

While some may consider this a minor inconvenience, the problem is that *setup code is common*, so it's, at least, a rather frequent, minor inconvenience.

With function modules, users have a module body to run any necessary setup code.

Now, WRT the actual example code, I'm not suggesting that either example is ideal. Only that it should *help* identify one particular advantage of function modules.

CREATE OR REPLACE FUNCTION public.gtranslate(src text, target text, phrase text)
RETURNS text
LANGUAGE plpython3u
AS $function$
from urllib.request import URLopener
from urllib.parse import quote_plus
import json

base_uri = "http://ajax.googleapis.com/ajax/services/language/translate?"

class UrlOpener(URLopener):
version = "py-gtranslate/1.0"
urlopen = UrlOpener().open

equal_fmt = '{0}={1}'.format

@pytypes
def main(src, to, phrase):
args = (
('v', '1.0'),
('langpair', quote_plus(src + '|' + to)),
('q', quote_plus(phrase)),
)
argstring = '&'.join([equal_fmt(k,v) for (k,v) in args])

resp = urlopen(base_uri + argstring).read()
resp = json.loads(resp.decode('utf-8'))
try:
return resp['responseData']['translatedText']
except:
# should probably warn about failed translation
return phrase
$function$;

pl_regression=# SELECT gtranslate('en', 'es', 'i like coffee');
gtranslate
------------------
Me gusta el café
(1 row)

pl_regression=# SELECT gtranslate('en', 'de', 'i like coffee');
gtranslate
----------------
Ich mag Kaffee
(1 row)

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: David Blewett on 17 Jan 2010 17:15

On Sun, Jan 17, 2010 at 4:07 PM, James William Pye <lists(a)jwp.name> wrote:
> The effect of this is that every time the FUNCTION is called from PG, the import statements are ran, a new class object, UrlOpener, is created, and a new function object, translate, is created. Granted, a minor amount of overhead in this case, but the point is that in order to avoid it the author would have to use SD:
>
> if "urlopener" in SD:
> UrlOpener = SD["urlopener"]
> else:
> class UrlOpener(urllib.UrlOpener):
> ...
> SD["urlopener"] = UrlOpener
>
> While some may consider this a minor inconvenience, the problem is that *setup code is common*, so it's, at least, a rather frequent, minor inconvenience.
>
>
> With function modules, users have a module body to run any necessary setup code.

Coming from a Python developer perspective, this is indeed an
improvement. I always thought the whole SD/GD thing was a little odd.
Doing the setup at the module level and relying on the interpreter to
keep it cached is much more "Pythonic" and is a common idiom.

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: James William Pye on 23 Jan 2010 15:28

On Jan 14, 2010, at 7:08 PM, Greg Smith wrote:
> So more targeted examples like you're considering now would help.

Here's the trigger example which should help reveal some of the advantages of "native typing". This is a generic trigger that constructs and logs manipulation statements for simple replication purposes.

The original plpython version is located here:

http://ar.pycon.org/common/2009/talkdata/PyCon2009/020/plpython.txt
[You'll need to scroll down to the very bottom of that page.]

There are three points in this example that need to be highlighted:

1. There is no need for a "mogrify" function (see original in the above link).
2. Attributes/columns of the records (new/old) are extracted when referenced.
3. The comparisons in after_update uses the data type's actual inequality operator.

The first point is true because "native typing" gives the user direct access to a given type's typoutput via ``str(ob)``. This makes constructing the PG string representation of a given object *much* easier--quote_nullable, and done. The original plpython example will need to be updated to compensate for any changes in conversion: arrays will now need special handling and MD arrays will not work at all. It also relies heavily on the Python object representation matching PG's; where that fails, special cases need to be implemented(composites, notably). All of that compensation performed in the original version is unnecessary in the plpython3 version.

The second point touches on the "efficiency" that was referenced in an earlier message. No cycles are spent converting the contents of a container object unless the user chooses to. Naturally, there is no advantage performance-wise if you are always converting everything.
I'd wager that with triggers, it's rare that everything needs to be converted.

The third point reveals that Postgres.Object instances--a component of native typing--use the data type's operator for inequality. It's not limited to comparisons as all available Python operators are mapped to corresponding operators in PG. For many or all primitives, there is no added value over conversion. However, this provides a lot of convenience when working with UDTs, datetime types, and geometric types.

....ISTM that the primary advantage of "native typing" is that we get to define the Python interface to a given Postgres data type.

Three files are attached:

afterlog.py - the trigger returning function
afterlog.sql - the sql exercising the TRF (creates the replica_log table as well)
afterlog.out - the contents of the replica_log table after executing afterlog.sql

To replay:

\i afterlog.py
\i afterlog.sql
SELECT * FROM replica_log;

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: [HACKERS] xml2 still essential for us
Next: lock_timeout GUC patch