From: KaiGai Kohei on
(2010/06/08 11:28), Stephen Frost wrote:
> For the sake of clarity..
>
> * KaiGai Kohei (kaigai(a)ak.jp.nec.com) wrote:
>> OK, it was too implementation-specific.
>
> No, that wasn't the problem. There isn't an actual implementation yet
> for it to be too-specific on. The problem is that proposing a change to
> the catalog without figuring out what it'd actually be used for in an
> overall solution is a waste of time.
>
Indeed,

>> Please return to the categorization with 3-level that I mentioned at
>> the previous message.
>
> As Robert said, we're off in the weeds here. I'm not convinced that
> we've got 3 levels, for starters. It could well be fewer, or more.
> Let's stop making assumptions about what's OK and what's not OK.
>
Indeed, we may find out the 4th category in the future.

>> For built-in functions, the code should be reviewed to ensure it does not
>> expose the given argument using error messages.
>> Then, we can mark it as trusted.
>
> One thing that I think *is* clear- removing useful information from
> error messages is *not* going to be an acceptable "solution".
>
Even if it is conditional, like as Greg Stark suggested?

Thanks,
--
KaiGai Kohei <kaigai(a)ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: KaiGai Kohei on
(2010/06/08 11:15), Robert Haas wrote:
> 2010/6/7 KaiGai Kohei<kaigai(a)ak.jp.nec.com>:
>> Our headache is on functions categorized to middle-threat. It enables to
>> leak the given arguments using error messages. Here are several ideas,
>> but they have good and bad points.
>
> I think we are altogether off in the weeds here. We ought to start
> with an implementation that pushes nothing down, and then try to
> figure out how much we can relax that without too much compromising
> security.
>

The attached patch tries to prevent pushing down anything into subqueries
from outside of them.

The distribute_qual_to_rels() tries to distribute the given qualifier
into a certain scanning-plan based on the dependency of qualifier.

E.g) SELECT * FROM (SELECT * FROM t1 JOIN t2 ON t1.a = t2.x WHERE f_policy(t1.a)) WHERE f_user(t2.x);

In this case, f_user() function depends on only t2 table, so it is
reasonable to attach on the scanning plan of t2 from perspective of
performance.

However, f_user() may have a side-effect which writes arguments into
somewhere. If here is such a possibility, f_user() should not be called
before the joined tuples being filtered by f_policy().

In the case when we can ensure all functions within the qualifier are
enough trustable, we don't need to prevent them to be pushed down.
But the algorithm to determine it is under discussion. So, right now,
we prevent all the possible pushing down.

Example.1) CREATE VIEW v1 AS SELECT * FROM t1 JOIN t2 ON a = x WHERE f_policy(a);
SELECT * FROM v1 WHERE f_malicious(b);

* without this patch
postgres=# EXPLAIN SELECT * FROM v1 WHERE f_malicious(b);
QUERY PLAN
-------------------------------------------------------------------
Hash Join (cost=639.01..667.29 rows=137 width=72)
Hash Cond: (t2.x = t1.a)
-> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36)
-> Hash (cost=637.30..637.30 rows=137 width=36)
-> Seq Scan on t1 (cost=0.00..637.30 rows=137 width=36)
Filter: (f_policy(a) AND f_malicious(b))
(6 rows)

* with this patch
postgres=# EXPLAIN SELECT * FROM v1 WHERE f_malicious(b);
QUERY PLAN
-------------------------------------------------------------------
Hash Join (cost=334.93..468.44 rows=137 width=72)
Hash Cond: (t2.x = t1.a)
Join Filter: f_malicious(t1.b)
-> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36)
-> Hash (cost=329.80..329.80 rows=410 width=36)
-> Seq Scan on t1 (cost=0.00..329.80 rows=410 width=36)
Filter: f_policy(a)
(7 rows)

It prevents to push down f_malicious() inside of the join loop.


Example.2) CREATE VIEW v1 AS SELECT * FROM t1 JOIN t2 ON a = x WHERE f_policy(a);
SELECT * FROM v1 JOIN t3 ON v1.a=t3.s WHERE f_malicious(b);

* without this patch
postgres=# EXPLAIN SELECT * FROM v1 JOIN t3 ON v1.a=t3.s WHERE f_malicious(b);
QUERY PLAN
-------------------------------------------------------------------------------
Hash Join (cost=669.01..697.29 rows=137 width=108)
Hash Cond: (t3.s = t1.a)
-> Seq Scan on t3 (cost=0.00..22.30 rows=1230 width=36)
-> Hash (cost=667.29..667.29 rows=137 width=72)
-> Hash Join (cost=639.01..667.29 rows=137 width=72)
Hash Cond: (t2.x = t1.a)
-> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36)
-> Hash (cost=637.30..637.30 rows=137 width=36)
-> Seq Scan on t1 (cost=0.00..637.30 rows=137 width=36)
Filter: (f_policy(a) AND f_malicious(b))
(10 rows)

* with this patch
postgres=# EXPLAIN SELECT * FROM v1 JOIN t3 ON v1.a=t3.s WHERE f_malicious(b);
QUERY PLAN
-------------------------------------------------------------------------------
Hash Join (cost=470.15..498.43 rows=137 width=108)
Hash Cond: (t3.s = t1.a)
-> Seq Scan on t3 (cost=0.00..22.30 rows=1230 width=36)
-> Hash (cost=468.44..468.44 rows=137 width=72)
-> Hash Join (cost=334.93..468.44 rows=137 width=72)
Hash Cond: (t2.x = t1.a)
Join Filter: f_malicious(t1.b)
-> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36)
-> Hash (cost=329.80..329.80 rows=410 width=36)
-> Seq Scan on t1 (cost=0.00..329.80 rows=410 width=36)
Filter: f_policy(a)
(11 rows)

It also prevents f_malisious() to be pushed down into the join loop within view,
but we can push it down into same level of the query.


Please note that it specially handles equality operator at the bottom half of
the distribute_qual_to_rels(), so this patch does not care about these cases.
However, I'm not in hustle to prevent these optimization, because I guess
these should be entirely trusted. So, the patch is in just a start up phase,
not commitable anyway.

postgres=# EXPLAIN SELECT * FROM v1 WHERE b = 'aaa';
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop (cost=0.00..349.44 rows=2 width=72)
-> Seq Scan on t1 (cost=0.00..332.88 rows=2 width=36)
Filter: ((b = 'aaa'::text) AND f_policy(a))
-> Index Scan using t2_pkey on t2 (cost=0.00..8.27 rows=1 width=36)
Index Cond: (t2.x = t1.a)
(5 rows)

Thanks,
--
KaiGai Kohei <kaigai(a)ak.jp.nec.com>