From: Tom Lane on
Robert Haas <robertmhaas(a)gmail.com> writes:
> On Wed, Jun 9, 2010 at 1:34 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> Yes, it's not a trivial fix either. �We'll have to provide functions or
>> views that replace the current usages without letting the user insert
>> untrusted strings.

> Maybe I'm all wet here, but don't we need to come up with something we
> can back-patch?

Well, ideally yes, but if it's not actually *secure* then there's no
point --- and I don't believe that the approach of making readfuncs.c
secure against malicious input has the proverbial snowball's chance
of ever being bulletproof.

[ thinks for awhile... ] I wonder whether there is any way of locking
down pg_get_expr so that it throws an error if called with anything
except a suitable field from one of the system catalogs. There are only
a few usage patterns that we need to allow, no? At least in recent PG
versions it is possible for the function to check that its input
expression is a Var. If we had some (probably horridly ugly) way to
obtain the rangetable entry the Var refers to, we could put code into
pg_get_expr to barf if it's not used in a context like
"select pg_get_expr(adbin) from pg_attrdef".

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Wed, Jun 9, 2010 at 2:04 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas(a)gmail.com> writes:
>> On Wed, Jun 9, 2010 at 1:34 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>>> Yes, it's not a trivial fix either. �We'll have to provide functions or
>>> views that replace the current usages without letting the user insert
>>> untrusted strings.
>
>> Maybe I'm all wet here, but don't we need to come up with something we
>> can back-patch?
>
> Well, ideally yes, but if it's not actually *secure* then there's no
> point --- and I don't believe that the approach of making readfuncs.c
> secure against malicious input has the proverbial snowball's chance
> of ever being bulletproof.

I don't really see how it could be *impossible* to securely parse text
input. It's certainly possible not to crash on trivially malformed
input. Completely validating the input MIGHT cost more in performance
than we want to pay in CPU cycles, but I guess I'm not seeing why it
would be an unsolvable problem apart from that.

> [ thinks for awhile... ] �I wonder whether there is any way of locking
> down pg_get_expr so that it throws an error if called with anything
> except a suitable field from one of the system catalogs. �There are only
> a few usage patterns that we need to allow, no? �At least in recent PG
> versions it is possible for the function to check that its input
> expression is a Var. �If we had some (probably horridly ugly) way to
> obtain the rangetable entry the Var refers to, we could put code into
> pg_get_expr to barf if it's not used in a context like
> "select pg_get_expr(adbin) from pg_attrdef".

That's sort of clever... in a really ugly sort of way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
I wrote:
> [ thinks for awhile... ] I wonder whether there is any way of locking
> down pg_get_expr so that it throws an error if called with anything
> except a suitable field from one of the system catalogs.

I did a bit of research into this idea. It looks at least somewhat
feasible:

* All PG versions back to 7.4 will pass the calling expression tree via
fcinfo->flinfo->fn_expr. Lack of that would be a showstopper, because
we can't change the FmgrInfo struct in back branches (ABI break). So we
can get the arguments and determine whether they are Vars.

* To determine which table a Var actually refers to, we must have access
to the rangetable, and in join cases we also need access to the plan
tree node containing the Var (since we have to drill down to the plan
node's inputs to resolve OUTER and INNER references). The rangetable is
reachable from the PlanState node, so it's sufficient to make that one
pointer available to functions. The obvious way to handle this is to
add a field to FmgrInfo, and I would suggest doing it that way as of
9.0. In the back branches, we could probably hack it without an ABI
break by having fn_expr point to some intermediate node type that
contains both the actual expression tree and the PlanState pointer
(probably, we'd make it point to FuncExprState instead of directly to
the FuncExpr, and then add the field to FuncExprState, which is far less
dangerous than redefining struct FmgrInfo). Now this depends on the
assumption that no external functions are doing anything with fn_expr
except passing it to the related fmgr.c routines, which we would teach
about this convention.

* Once we've got the above data it's a simple matter of adapting the
Var-interpretation logic used by EXPLAIN in order to find out the table
OID and column number, if any, represented by the Var. I'd suggest
adding such functions in fmgr.c to extend the API currently offered by
get_fn_expr_argtype() and friends. It seems possible that other
functions besides pg_get_expr might have use for such a capability.


What I'm suggesting is definitely not a trivial patch, and I would never
consider back-patching it if it weren't a security matter. But I think
it's doable and we'd be fixing the hole with a determinate amount of
work, whereas trying to verify the validity of pg_get_expr input
directly would be a never-ending source of more security bugs.

Comments?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on
Robert Haas <robertmhaas(a)gmail.com> writes:
> On Wed, Jun 9, 2010 at 2:04 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> Well, ideally yes, but if it's not actually *secure* then there's no
>> point --- and I don't believe that the approach of making readfuncs.c
>> secure against malicious input has the proverbial snowball's chance
>> of ever being bulletproof.

> I don't really see how it could be *impossible* to securely parse text
> input. It's certainly possible not to crash on trivially malformed
> input.

The operative word in that claim is "trivial". The problem that I see
is that there are many assumptions in the system about the structure and
interrelationships of expression node trees, for instance that certain
List fields contain only certain node types. I don't believe that it's
practical to make the node reading code enforce every one of those
assumptions, or that it'd be maintainable if we did manage to get it
right to start with. Certainly we can make the node reading code do
more checking than it does now, but the odds of making things
bulletproof against malicious input are negligible. I don't want to be
going back to fix another hole every other month for the lifetime of the
project, but that's exactly what we'll be doing if we try to fix it that
way.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Mon, Jun 21, 2010 at 7:50 PM, Heikki Linnakangas
<heikki.linnakangas(a)enterprisedb.com> wrote:
> On 15/06/10 10:31, Heikki Linnakangas wrote:
>>
>> You could avoid changing the meaning of fn_expr by putting the check in
>> the parse analysis phase, into transformFuncCall(). That would feel
>> safer at least for back-branches.
>
> Here's a patch using that approach.
>
> I grepped through PostgreSQL and pgadmin source code to find the system
> columns where valid node-strings are stored:
>
> pg_index.indexprs
> pg_index.indprep
> pg_attrdef.adbin
> pg_proc.proargdefaults
> pg_constraint.conbin
>
> Am I missing anything?

I think that pg_type.typdefaultbin is used by pg_dump.
pg_rewrite.ev_qual, pg_rewrite.ev_action, pg_trigger.tgqual also
contain nodeToString() output but I didn't have any luck using them
with pg_get_expr() so maybe they don't need to be included.

The only other thing I notice is that, obviously, the FIXME comment
needs to be FIXMEd before commit.

I'd still be in favor of inserting at least some basic error checks
into readfuncs.c, though just in HEAD. The restrictions implemented
here seem adequate to prevent a security vulnerability, but superusers
can still invoke those functions manually, and while superusers can
clearly crash the system in any number of ways, that doesn't seem (to
me) like an adequate justification for ignoring the return value of
strtok(). YMMV, of course.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers