[PATCH] Fix leaky VIEWs for RLS [PgSql]

Prev: [HACKERS] SET CONSTRAINTS todo
Next: [HACKERS] hot_standby = on

From: KaiGai Kohei on 3 Jun 2010 20:32

I fixed up the subject.

(2010/06/04 2:23), Marc Munro wrote:
> On Thu, 2010-06-03 at 05:53 -0300, pgsql-hackers-owner(a)postgresql.org
> wrote:
>> [ . . . ]
>>
>> In my current idea, when a qual-node that contains FuncExpr tries to
>> reference a part of relations within a security view, its referencing
>> relations will be expanded to whole of the security view at
>> distribute_qual_to_rels().
>> [ . . . ]
>
> I may be missing something here but this seems a bit too simplistic and,
> I think, fails to deal with an important use case.
>
> Security views, that do anything useful at all, tend to introduce
> performance issues. I am concerned that placing a conceptual barrier
> between the secured and unsecured parts of queries is going to
> unnecessarily restrict what the optimiser can do.
>
> For example consider that we have three secured views, each of the form:
>
> create view s_x as select * from x where i_can_see(x.key);
>
> and consider the query:
>
> select stuff from s_x
> inner join s_y on s_y.key = s_x.key
> inner join s_z on s_z.key = s_x.key
> where fn(s_x.a) = 3;
>
> The optimiser ought to be able to spot the fact that i_can_see() need
> only be called once for each joined result. By placing a barrier (if I
> understand your proposal correctly) between the outermost joins and the
> inner views, doesn't this optimisation become impossible?
>
It seems to me incorrect.

If i_can_see() can filter a part of tuples within table: x, the optimizer
prefers to call i_can_see() on scanning each tables rather than result of
join, because it effectively reduce the size of scan.

In fact, the existing optimizer make the following plan:

postgres=# CREATE FUNCTION i_can_see(int) RETURNS bool IMMUTABLE
AS 'begin return $1 % 1 = 1; end;' LANGUAGE 'plpgsql';
CREATE FUNCTION
postgres=# CREATE VIEW v1 as select * from t1 where i_can_see(a);
CREATE VIEW
postgres=# CREATE VIEW v2 as select * from t2 where i_can_see(x);
CREATE VIEW
postgres=# CREATE VIEW v3 as select * from t3 where i_can_see(s);
CREATE VIEW

postgres=# EXPLAIN SELECT * FROM (v1 JOIN v2 ON v1.a=v2.x) JOIN v3 on v1.a=v3.s;
QUERY PLAN
-------------------------------------------------------------------------------
Nested Loop (cost=0.00..860.47 rows=410 width=108)
-> Nested Loop (cost=0.00..595.13 rows=410 width=72)
-> Seq Scan on t1 (cost=0.00..329.80 rows=410 width=36)
Filter: i_can_see(a)
-> Index Scan using t2_pkey on t2 (cost=0.00..0.63 rows=1 width=36)
Index Cond: (t2.x = t1.a)
Filter: i_can_see(t2.x)
-> Index Scan using t3_pkey on t3 (cost=0.00..0.63 rows=1 width=36)
Index Cond: (t3.s = t1.a)
Filter: i_can_see(t3.s)
(10 rows)

Sorry, I may miss something your point.

> I think a simpler solution may be possible here. If you can tag the
> function i_can_see() as a security function, at least in the context of
> its use in the security views, and then create the rule that security
> functions are always considered to be lower cost than user-defined
> non-security functions, don't we achieve the result of preventing the
> insecure function from seeing rows that it shouldn't?
>
Sorry, it seems to me you mixed up different issues.

My patch tries to tackle the problem that optimizer distributes outermost
functions into inside of the view when the function takes arguments only
from a side of join, even if security views.
This issue is independent from cost of functions.

On the other hand, when a scan plan has multiple qualifiers to filter
fetched tuples, we have to pay attention on the order to be called.
If outermost functions are called prior to view's policy functions,
it may cause unexpected leaks.

In my opinion, we should consider the nestlevel of the function where
it is originally put. But it is not concluded yet, and the patch does
not tackle anything about this issue.

> I guess my concern is that a query may be constructed a=out of secured
> and unsecured parts and the optimiser should be free to group all of the
> secured parts together before considering the unsecured parts.
>
Yes, it is an opinion, but it may cause unnecessary performance regression
when user given condition is obviously harmless.

E.g, If table: t has primary key t.a, and a security view is defined as:

CREATE VIEW v AS SELECT * FROM t WHERE f_policy(t.b);

When user gives the following query:

SELECT * FROM v WHERE a = 100;

If we strictly separate secured and unsecure part prior to optimizer,
it will break a chance to scan the table: t with index.

I also think security is not free, so it may take a certain level of
performance degradation. But it should not bring down more than necessity.

Thanks,
--
KaiGai Kohei <kaigai(a)ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: KaiGai Kohei on 4 Jun 2010 00:17

(2010/06/04 11:55), Robert Haas wrote:
> On Thu, Jun 3, 2010 at 1:23 PM, Marc Munro<marc(a)bloodnok.com> wrote:
>> On Thu, 2010-06-03 at 05:53 -0300, pgsql-hackers-owner(a)postgresql.org
>> wrote:
>>> [ . . . ]
>>>
>>> In my current idea, when a qual-node that contains FuncExpr tries to
>>> reference a part of relations within a security view, its referencing
>>> relations will be expanded to whole of the security view at
>>> distribute_qual_to_rels().
>>> [ . . . ]
>>
>> I may be missing something here but this seems a bit too simplistic and,
>> I think, fails to deal with an important use case.
>
> If anything, you're putting it mildly. This is quite a bit too
> simplistic and fails to deal with several important issues, at least
> some of which have already been mentioned on this thread.
>
Hmm. Was it too simplified even if just a proof of concept?

>> The optimiser ought to be able to spot the fact that i_can_see() need
>> only be called once for each joined result. By placing a barrier (if I
>> understand your proposal correctly) between the outermost joins and the
>> inner views, doesn't this optimisation become impossible?
>>
>> I think a simpler solution may be possible here. If you can tag the
>> function i_can_see() as a security function, at least in the context of
>> its use in the security views, and then create the rule that security
>> functions are always considered to be lower cost than user-defined
>> non-security functions, don't we achieve the result of preventing the
>> insecure function from seeing rows that it shouldn't?
>
> So, yes and no. You DO need a security barrier between the view and
> the rest of the query, but if a function can be trusted not to do evil
> things, then it should be allowed to be pushed down. What we need to
> prevent is the pushdown of untrusted functions (or operators). A
> (very) important part of this problem is determining which quals are
> safe to push down.
>
At least, I don't have an idea to distinguish trusted functions from
others without any additional hints, because we support variable kind
of PL languages. :(

An idea is to add TRUSTED (or other) keyword for CREATE FUNCTION to mark
this function is harmless to execute, but only DBA can define functions
with the option.
(Perhaps, most of built-in functions should be marked as trusted?)

If we should identify a function as either trusted or untrusted, is
there any other ideas?

Thanks,
--
KaiGai Kohei <kaigai(a)ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 4 Jun 2010 00:57

KaiGai Kohei <kaigai(a)ak.jp.nec.com> writes:
> (2010/06/04 11:55), Robert Haas wrote:
>> A (very) important part of this problem is determining which quals are
>> safe to push down.
>>
> At least, I don't have an idea to distinguish trusted functions from
> others without any additional hints, because we support variable kind
> of PL languages. :(

The proposal some time back in this thread was to trust all built-in
functions and no others. That's a bit simplistic, no doubt, but it
seems to me to largely solve the performance problem and to do so with
minimal effort. When and if you get to a solution that's committable
with respect to everything else, it might be time to think about
more flexible answers to that particular point.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: KaiGai Kohei on 4 Jun 2010 01:24

(2010/06/04 13:57), Tom Lane wrote:
> KaiGai Kohei<kaigai(a)ak.jp.nec.com> writes:
>> (2010/06/04 11:55), Robert Haas wrote:
>>> A (very) important part of this problem is determining which quals are
>>> safe to push down.
>>>
>> At least, I don't have an idea to distinguish trusted functions from
>> others without any additional hints, because we support variable kind
>> of PL languages. :(
>
> The proposal some time back in this thread was to trust all built-in
> functions and no others. That's a bit simplistic, no doubt, but it
> seems to me to largely solve the performance problem and to do so with
> minimal effort. When and if you get to a solution that's committable
> with respect to everything else, it might be time to think about
> more flexible answers to that particular point.
>
Although I've not check all the functions within pg_proc.h, it seems to
me reasonable assumptions, except for a small number of exception which
have side-effects, such as lo_write().

Maybe, we have to shut out a small number of exceptions.
However, at least, it seems to me reasonable assumption in this stage.

Thanks,
--
KaiGai Kohei <kaigai(a)ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Dimitri Fontaine on 4 Jun 2010 05:26

Tom Lane <tgl(a)sss.pgh.pa.us> writes:
> The proposal some time back in this thread was to trust all built-in
> functions and no others. That's a bit simplistic, no doubt, but it
> seems to me to largely solve the performance problem and to do so with
> minimal effort. When and if you get to a solution that's committable
> with respect to everything else, it might be time to think about
> more flexible answers to that particular point.

What about trusting all "internal" and "C" language function instead? My
understanding is that "internal" covers built-in functions, and as you
need to be a superuser to CREATE a "C" language function, surely you're
able to accept that by doing so you get to trust it?

How useful would that be?
--
dim

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

| Next | Last
Pages: 1 2 3 4 5 6 7 8
Prev: [HACKERS] SET CONSTRAINTS todo
Next: [HACKERS] hot_standby = on