GSoC - proposal - Materialized Views in PostgreSQL [PgSql]

Prev: [HACKERS] testing hot standby
Next: [HACKERS] testing HS/SR - 1 vs 2 performance

From: Josh Berkus on 12 Apr 2010 13:50

On 4/9/10 1:36 PM, pavelbaros wrote:
> 2) change rewriter
> - usually, view is relation with defined rule and when rewriting, rule
> is fired and relation (view) is replaced by definition of view. If
> relation do not have rule, planner and executor behave to it as physical
> table (relation). In case of materialized view we want to rewrite select
> statement only in case when we refreshing MV. In other cases rewriter
> should skip rewriting and pick up physical relation. Exclude situation
> when other rewrite rules which are not related to MV definition are
> specified.

This was done (although not completed) against PostgreSQL 7.1 by
students in Georgia, USA, I believe. It might be worthwhile looking at
their work if I can find it (if nowhere else, it should be in the ACM).

There are basically 2 major parts for materialized views:

A) Planner: Getting the query planner to swap in the MatView for part of
a query automatically for query plan portions which the MatView supports;

B) Maintenance: maintaining the MatView data according to the programmed
scheme (synch, asynch, periodic).

I do not believe it is possible to do both of the above in one summer.
Of the two, (A) would be more useful since it is possible to manually
implement (B) using triggers, queues and cron jobs today.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 12 Apr 2010 14:04

On Mon, Apr 12, 2010 at 1:50 PM, Josh Berkus <josh(a)agliodbs.com> wrote:
> On 4/9/10 1:36 PM, pavelbaros wrote:
>> 2) change rewriter
>> - usually, view is relation with defined rule and when rewriting, rule
>> is fired and relation (view) is replaced by definition of view. If
>> relation do not have rule, planner and executor behave to it as physical
>> table (relation). In case of materialized view we want to rewrite select
>> statement only in case when we refreshing MV. In other cases rewriter
>> should skip rewriting and pick up physical relation. Exclude situation
>> when other rewrite rules which are not related to MV definition are
>> specified.
>
> This was done (although not completed) against PostgreSQL 7.1 by
> students in Georgia, USA, I believe. It might be worthwhile looking at
> their work if I can find it (if nowhere else, it should be in the ACM).
>
> There are basically 2 major parts for materialized views:
>
> A) Planner: Getting the query planner to swap in the MatView for part of
> a query automatically for query plan portions which the MatView supports;
>
> B) Maintenance: maintaining the MatView data according to the programmed
> scheme (synch, asynch, periodic).
>
> I do not believe it is possible to do both of the above in one summer.
> Of the two, (A) would be more useful since it is possible to manually
> implement (B) using triggers, queues and cron jobs today.

I don't believe that it's possible to do EITHER of those things in one
summer. I believe that a basic implementation that has NO bells and
whistles at all, as originally proposed, is going to be a Very Hard
Project.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on 12 Apr 2010 15:43

Josh Berkus wrote:
> There are basically 2 major parts for materialized views:
> A) Planner: Getting the query planner to swap in the MatView for part of
> a query automatically for query plan portions which the MatView supports;
> B) Maintenance: maintaining the MatView data according to the programmed
> scheme (synch, asynch, periodic).
>

I'm run more into problems where it's perfectly fine to specify using
the materialized view directly in the query, but keeping that view up to
date usefully was the real problem. The whole idea of getting a MV used
automatically is valuable, but far down the roadmap as I see it.

Not everyone would agree of course, and your description does suggest a
better way to organize a high-level summary though; here's a first cut:

1) Creation of materalized view
Current state: using "CREATE TABLE AS" or similar mechanism, maintain
manually
Optimal: "CREATE MATERIALIZED VIEW" grammar, metadata to store MV data,
dump/reload support

2) Updating materialized views
Current state: periodically create new snapshots, or maintain using
triggers
Optimal: Built-in refresh via multiple strategies, with minimal locking
as to improve concurrent access

3) Using materialized views in the planner
Current state: specify the manually created MV in queries that can use it
Optimal: Automatically accelerate queries that could be satisfied by
substituting available MVs

With (1) being what I think is the only GSoC sized subset here.

I'm not saying someone can't jump right into (3), using the current
implementations for (1) and (2) that are floating around out there. I
just think it would end up wasting a fair amount of work on prototypes
that don't work quite the same way as the eventual fully integrated
version. You certainly can start working on (3) without a fully fleshed
out implementation of (2), I don't know that it makes sense to work on
before (1) though.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.us

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 12 Apr 2010 16:03

On Mon, Apr 12, 2010 at 3:43 PM, Greg Smith <greg(a)2ndquadrant.com> wrote:
> Josh Berkus wrote:
>>
>> There are basically 2 major parts for materialized views:
>> A) Planner: Getting the query planner to swap in the MatView for part of
>> a query automatically for query plan portions which the MatView supports;
>> B) Maintenance: maintaining the MatView data according to the programmed
>> scheme (synch, asynch, periodic).
>>
>
> I'm run more into problems where it's perfectly fine to specify using the
> materialized view directly in the query, but keeping that view up to date
> usefully was the real problem. The whole idea of getting a MV used
> automatically is valuable, but far down the roadmap as I see it.
>
> Not everyone would agree of course, and your description does suggest a
> better way to organize a high-level summary though; here's a first cut:
>
> 1) Creation of materalized view
> Current state: using "CREATE TABLE AS" or similar mechanism, maintain
> manually
> Optimal: "CREATE MATERIALIZED VIEW" grammar, metadata to store MV data,
> dump/reload support
>
> 2) Updating materialized views
> Current state: periodically create new snapshots, or maintain using
> triggers
> Optimal: Built-in refresh via multiple strategies, with minimal locking as
> to improve concurrent access
>
> 3) Using materialized views in the planner
> Current state: specify the manually created MV in queries that can use it
> Optimal: Automatically accelerate queries that could be satisfied by
> substituting available MVs
>
> With (1) being what I think is the only GSoC sized subset here.
>
> I'm not saying someone can't jump right into (3), using the current
> implementations for (1) and (2) that are floating around out there. I just
> think it would end up wasting a fair amount of work on prototypes that don't
> work quite the same way as the eventual fully integrated version. You
> certainly can start working on (3) without a fully fleshed out
> implementation of (2), I don't know that it makes sense to work on before
> (1) though.

Good summary.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on 12 Apr 2010 17:46

Greg,

> I'm not saying someone can't jump right into (3), using the current
> implementations for (1) and (2) that are floating around out there. I
> just think it would end up wasting a fair amount of work on prototypes
> that don't work quite the same way as the eventual fully integrated
> version. You certainly can start working on (3) without a fully fleshed
> out implementation of (2), I don't know that it makes sense to work on
> before (1) though.

What would be the use case for (1) by itself?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: [HACKERS] testing hot standby
Next: [HACKERS] testing HS/SR - 1 vs 2 performance