Some notes about Param handling with "Oracle style" plpgsql variables [PgSql]

Prev: libpq - extending PQexecParams/PQexecPrepared tospecify resultFormat for individual result columns
Next: Renaming conversion procs (was Re: [GENERAL] Error on compile for Windows)

From: Pavel Stehule on 2 Nov 2009 12:01

2009/11/2 Tom Lane <tgl(a)sss.pgh.pa.us>:
> One of the interesting properties of Oracle-compatible variable
> references in plpgsql is that the set of variables referenced by a
> given query could change during a forced replan. Â For example,
> consider
>
> Â Â Â Â declare x int;
> Â Â Â Â Â Â Â Â r record;
> Â Â Â Â ...
> Â Â Â Â for r in select x,y from tab loop ...
>
> If tab contains a column "x" then the "x" reference in the SELECT
> refers to tab.x; if not, it refers to the plpgsql variable x.
> So when first executing the SELECT we might find that it requires
> a Param reference to the plpgsql variable, and then after a replan
> is forced by ALTER TABLE tab ADD COLUMN x, there is no need for
> the Param anymore. Â Or vice versa.
>
> This kinda calls into question whether the Oracle way is actually
> a good idea or not; but my purpose here is not to debate that,
> just to look at what it takes to implement it.
>
> Currently, plpgsql generates a list of the variables referenced by
> any SQL statement or expression immediately upon seeing the text,
> before it's ever even fed to the core parser. Â I had been envisioning
> having the parser callback hook construct the list on-the-fly during
> parsing, but the possibility that the list will change from time to
> time means that other changes are needed too. Â Notably:
>
> 1. plancache.c does not have any provision for letting the Param type
> array associated with a stored statement change when the statement is
> replanned due to SI invalidation.
>
> 2. The control flow for a replan is that plpgsql calls SPI_execute_plan,
> which calls RevalidateCachedPlan, which does the replan if the cached
> plan is discovered to be stale. Â However, plpgsql already had to set up
> the list of actual parameter values before it called SPI_execute_plan,
> which means it is *way* too late to change the list of required Params
> even if plancache let us do it.
>
> After chewing on these facts for awhile, I am thinking that the best
> solution is for plpgsql to abandon the notion of a predetermined list
> of parameters for a SQL query altogether. Â What that list basically
> provides is a mapping from Param numbers ($n) to plpgsql "datum numbers"
> (indexes in the list of a plpgsql function's variables). Â We could make
> that mapping always be one-to-one, since there's no real reason that the
> Params available to a query have to be consecutively numbered. Â So the
> transformColumnRef hook would just pass back a Param using the
> referenced variable's datum number as paramid; it wouldn't bother at all
> with building a data structure listing the specific variables actually
> used in the query.
>
> As far as plancache goes, it would therefore always see a null array
> of Param type OIDs associated with a plpgsql-generated query, and we'd
> not have to provide a way to update that. Â (We'd still keep the ability
> to store such an array, because most other callers of plancache will
> still want a fixed list of Params.) Â What we'd have to add to plancache
> instead is the ability to install caller-determined parser callback
> hooks when it is calling the parser for a replan. Â This seems fairly
> easy to do --- I'm envisioning a sort of meta-hook function that gets
> called with the new ParseState and can insert hook function pointers
> in it.
>
> The other issue with this is what to do at runtime. Â We could do it
> with no other changes if we had plpgsql always set up Values/Nulls
> arrays listing *every* datum's current value. Â This seems a bit
> brute-force though --- it could be slow in a function with a lot of
> variables, and in most cases any specific query or expression would
> not need most of those values. Â What I think we should do instead
> is extend the ParamListInfo structure to add a callback hook function
> that populates individual ParamExternData array entries on-demand.
> The core executor would call the hook when it tried to fetch the
> value of a Param that was currently invalid (ptype == 0). Â So the
> hook would be invoked only once per query per referenced parameter,
> which shouldn't be much overhead. Â Another interesting property
> of this approach is that it'd fix the longstanding user complaint
> that constructions like
> Â Â Â Â if (TG_OP = 'INSERT' and NEW.foo = 'bar') ....
> fail prematurely. Â The executor would never demand the value
> of NEW.foo, and thus not fail, if TG_OP isn't INSERT.
>

good idea.

regards
Pavel

> Comments?
>
> Â Â Â Â Â Â Â Â Â Â Â Â regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 2 Nov 2009 12:42

On Mon, Nov 2, 2009 at 11:31 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Another interesting property
> of this approach is that it'd fix the longstanding user complaint
> that constructions like
> if (TG_OP = 'INSERT' and NEW.foo = 'bar') ...
> fail prematurely. The executor would never demand the value
> of NEW.foo, and thus not fail, if TG_OP isn't INSERT.

I don't really know enough to comment on the best way to go about this
project overall, but fixing this would be incredibly nice, if it can
be done without too much damage.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Pavel Stehule on 2 Nov 2009 14:05

2009/11/2 Tom Lane <tgl(a)sss.pgh.pa.us>:
> One of the interesting properties of Oracle-compatible variable
> references in plpgsql is that the set of variables referenced by a
> given query could change during a forced replan. Â For example,
> consider
>
> Â Â Â Â declare x int;
> Â Â Â Â Â Â Â Â r record;
> Â Â Â Â ...
> Â Â Â Â for r in select x,y from tab loop ...
>
> If tab contains a column "x" then the "x" reference in the SELECT
> refers to tab.x; if not, it refers to the plpgsql variable x.
> So when first executing the SELECT we might find that it requires
> a Param reference to the plpgsql variable, and then after a replan
> is forced by ALTER TABLE tab ADD COLUMN x, there is no need for
> the Param anymore. Â Or vice versa.
>
> This kinda calls into question whether the Oracle way is actually
> a good idea or not; but my purpose here is not to debate that,
> just to look at what it takes to implement it.

This is reason, why I would to see third mode (incompatible with
Oracle or pg), that raise error, when it detects any intersecting
identifiers. I understand so this mode should not be default, but
personally I'll use it everywhere.

Pavel

>
> Currently, plpgsql generates a list of the variables referenced by
> any SQL statement or expression immediately upon seeing the text,
> before it's ever even fed to the core parser. Â I had been envisioning
> having the parser callback hook construct the list on-the-fly during
> parsing, but the possibility that the list will change from time to
> time means that other changes are needed too. Â Notably:
>
> 1. plancache.c does not have any provision for letting the Param type
> array associated with a stored statement change when the statement is
> replanned due to SI invalidation.
>
> 2. The control flow for a replan is that plpgsql calls SPI_execute_plan,
> which calls RevalidateCachedPlan, which does the replan if the cached
> plan is discovered to be stale. Â However, plpgsql already had to set up
> the list of actual parameter values before it called SPI_execute_plan,
> which means it is *way* too late to change the list of required Params
> even if plancache let us do it.
>
> After chewing on these facts for awhile, I am thinking that the best
> solution is for plpgsql to abandon the notion of a predetermined list
> of parameters for a SQL query altogether. Â What that list basically
> provides is a mapping from Param numbers ($n) to plpgsql "datum numbers"
> (indexes in the list of a plpgsql function's variables). Â We could make
> that mapping always be one-to-one, since there's no real reason that the
> Params available to a query have to be consecutively numbered. Â So the
> transformColumnRef hook would just pass back a Param using the
> referenced variable's datum number as paramid; it wouldn't bother at all
> with building a data structure listing the specific variables actually
> used in the query.
>
> As far as plancache goes, it would therefore always see a null array
> of Param type OIDs associated with a plpgsql-generated query, and we'd
> not have to provide a way to update that. Â (We'd still keep the ability
> to store such an array, because most other callers of plancache will
> still want a fixed list of Params.) Â What we'd have to add to plancache
> instead is the ability to install caller-determined parser callback
> hooks when it is calling the parser for a replan. Â This seems fairly
> easy to do --- I'm envisioning a sort of meta-hook function that gets
> called with the new ParseState and can insert hook function pointers
> in it.
>
> The other issue with this is what to do at runtime. Â We could do it
> with no other changes if we had plpgsql always set up Values/Nulls
> arrays listing *every* datum's current value. Â This seems a bit
> brute-force though --- it could be slow in a function with a lot of
> variables, and in most cases any specific query or expression would
> not need most of those values. Â What I think we should do instead
> is extend the ParamListInfo structure to add a callback hook function
> that populates individual ParamExternData array entries on-demand.
> The core executor would call the hook when it tried to fetch the
> value of a Param that was currently invalid (ptype == 0). Â So the
> hook would be invoked only once per query per referenced parameter,
> which shouldn't be much overhead. Â Another interesting property
> of this approach is that it'd fix the longstanding user complaint
> that constructions like
> Â Â Â Â if (TG_OP = 'INSERT' and NEW.foo = 'bar') ....
> fail prematurely. Â The executor would never demand the value
> of NEW.foo, and thus not fail, if TG_OP isn't INSERT.
>
> Comments?
>
> Â Â Â Â Â Â Â Â Â Â Â Â regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 2 Nov 2009 14:16

Pavel Stehule <pavel.stehule(a)gmail.com> writes:
> 2009/11/2 Tom Lane <tgl(a)sss.pgh.pa.us>:
>> This kinda calls into question whether the Oracle way is actually
>> a good idea or not; but my purpose here is not to debate that,
>> just to look at what it takes to implement it.

> This is reason, why I would to see third mode (incompatible with
> Oracle or pg), that raise error, when it detects any intersecting
> identifiers. I understand so this mode should not be default, but
> personally I'll use it everywhere.

Actually, I thought we'd decided that "throw error on conflict"
*would* be the default behavior.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Pavel Stehule on 2 Nov 2009 14:18

2009/11/2 Tom Lane <tgl(a)sss.pgh.pa.us>:
> Pavel Stehule <pavel.stehule(a)gmail.com> writes:
>> 2009/11/2 Tom Lane <tgl(a)sss.pgh.pa.us>:
>>> This kinda calls into question whether the Oracle way is actually
>>> a good idea or not; but my purpose here is not to debate that,
>>> just to look at what it takes to implement it.
>
>> This is reason, why I would to see third mode (incompatible with
>> Oracle or pg), that raise error, when it detects any intersecting
>> identifiers. I understand so this mode should not be default, but
>> personally I'll use it everywhere.
>
> Actually, I thought we'd decided that "throw error on conflict"
> *would* be the default behavior.

good

regards
Pavel
>
> Â Â Â Â Â Â Â Â Â Â Â Â regards, tom lane
>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

| Next | Last
Pages: 1 2
Prev: libpq - extending PQexecParams/PQexecPrepared tospecify resultFormat for individual result columns
Next: Renaming conversion procs (was Re: [GENERAL] Error on compile for Windows)