GSoC - proposal - Materialized Views in PostgreSQL [PgSql]

Prev: [HACKERS] testing hot standby
Next: [HACKERS] testing HS/SR - 1 vs 2 performance

From: Robert Haas on 11 Apr 2010 01:08

On Sat, Apr 10, 2010 at 11:40 PM, Greg Smith <greg(a)2ndquadrant.com> wrote:
> To be frank, that makes for a materalized view implementation of little
> value over what you can currently do as far as I'm concerned. It might be
> interesting as a prototype, but that's not necessarily going to look like
> what's needed to do this for real at all. I'm not a big fan of dumping work
> into projects when you can see exactly how it's going to fail before you
> even get started. As I see if, if you know where it's going to fall down,
> you don't need to build a prototype as an exercise to show you how to build
> it--you should work on that part first instead.

Hopefully, you're already aware that I have enormous respect for your
opinions on a wide variety of topics; if not, let me publicly say that
I absolutely do.

Having said that, I disagree with your conclusions in this instance.
I see nothing but upside from this work. It is vastly easier to write
a patch that builds on existing functionality than it is to write
something new from scratch. If there's any value in having manually
refreshed materialized views, then having the simplest possible
implementation of what those can look like committed will make it far
easier to plan out next steps. While the proposed implementation may
not solve a huge number of real-world problems, I think there's a good
argument that some people will get good use of it. Not everyone has
1TB tables with continuous access patterns. And, provided that it
doesn't conflict with anything we want to do in the future, being
useful to some people is a good enough reason to put it in.

I also think that you're underestimating the number of problems that
will have to be solved to get this done. It's going to take some
significant work - both design work and coding work - to figure out
how this should integrate into the rest of the system. (What should
be the value of pg_class.relkind? Where should the node
representation of the snapshot query be stored? And did we handle all
of those OID dependencies correctly?)

Where I can see this possibly falling down (other than being just too
much work for a relative PostgreSQL novice to get it done in one
summer) is if there are concerns about it being incompatible with
incrementally-updated views. I imagine that we're going to want to
eventually support both, so we need to make sure that this
implementation doesn't box us into a corner. But as far as snapshot
views go, complaining that the proposed locking is too strong doesn't
seem quite fair. Fixing that, AFAICS, is a very hard project,
possibly involving significant planner support and an implementation
of MERGE, and I would much rather try to land a fundamentals patch
like this first and then deal with the gyrations that will be involved
in making this work than try to land the whole thing all at once.

Of course, if I'm missing something, and there's a SIMPLE way to get
materialized views that can be refreshed without a full-table lock,
that's another story altogether - maybe you have an idea?

Finally, even if we decided NOT to merge this patch because of the
limitations you mention (and right now that doesn't seem to be the
consensus), having this part of it completed as a starting point for
future work might be reason enough by itself.

In short: I think you may be letting the perfect be the enemy of the good.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on 11 Apr 2010 05:24

Robert Haas wrote:
> I also think that you're underestimating the number of problems that
> will have to be solved to get this done. It's going to take some
> significant work - both design work and coding work - to figure out
> how this should integrate into the rest of the system. (What should
> be the value of pg_class.relkind? Where should the node
> representation of the snapshot query be stored? And did we handle all
> of those OID dependencies correctly?)
>

I don't think I'm underestimating all that, but I suspect Pavel is by a
considerable amount. This is why I've been suggesting that a GSoC scope
here might just be wrestling with this area of the problem for the whole
summer--not even getting into updates beyond a completely trivial
implementation, if any at all. Things like "handle OID dependencies"
are definitely not on the fun side of the development work that people
tend to think about in advance.

> Where I can see this possibly falling down (other than being just too
> much work for a relative PostgreSQL novice to get it done in one
> summer) is if there are concerns about it being incompatible with
> incrementally-updated views. I imagine that we're going to want to
> eventually support both, so we need to make sure that this
> implementation doesn't box us into a corner.

Exactly my concern; comitting this part without knowing how that's later
going to fit into place strikes me the sort of the thing this project
doesn't like to do. The alternate approach of starting with the update
machinery is less likely IMHO to get stuck wondering if there's a future
blind spot coming or not, since you'd be building from the bottom up
starting with the hardest parts.

From the rest of your comments, I'm comfortable that you're in sync
with the not necessarily obvious risky spots here I wanted to raise
awareness of. It's unreasonable to expect we'll have exactly the same
priorities here, and I doubt it's useful to debate how I perceive the
merit of various development subsets here compared to yourself. I don't
think it's really important whether anyone agrees with me or not about
exactly the value of a full table lock implementation. The main thing
I'm concerned about is just that it's noted as a known risky part, one
that could end up blocking the project's ability to commit even a subset
of the proposed patch here.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.us

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Heikki Linnakangas on 11 Apr 2010 10:26

Robert Haas wrote:
> 2010/4/10 Andrew Dunstan <andrew(a)dunslane.net>:
>> Heikki Linnakangas wrote:
>>> 1. Keep the materialized view up-to-date when the base tables change.
>>> This can be further divided into many steps, you can begin by supporting
>>> automatic updates only on very simple views with e.g a single table and
>>> a where clause. Then extend that to support joins, aggregates,
>>> subqueries etc. Keeping it really limited, you could even require the
>>> user to write the required triggers himself.
>> That last bit doesn't strike me as much of an advance. Isn't the whole point
>> of this to automate it? Creating greedy materialized views is usually not
>> terribly difficult now, but you do have to write the triggers.
>
> Yeah, I agree.

It doesn't accomplish anything interesting on its own. But if you do the
planner changes to automatically use the materialized view to satisfy
queries (item 2. in my previous email), it's useful.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 11 Apr 2010 14:47

On Sun, Apr 11, 2010 at 10:26 AM, Heikki Linnakangas
<heikki.linnakangas(a)enterprisedb.com> wrote:
> Robert Haas wrote:
>> 2010/4/10 Andrew Dunstan <andrew(a)dunslane.net>:
>>> Heikki Linnakangas wrote:
>>>> 1. Keep the materialized view up-to-date when the base tables change.
>>>> This can be further divided into many steps, you can begin by supporting
>>>> automatic updates only on very simple views with e.g a single table and
>>>> a where clause. Then extend that to support joins, aggregates,
>>>> subqueries etc. Keeping it really limited, you could even require the
>>>> user to write the required triggers himself.
>>> That last bit doesn't strike me as much of an advance. Isn't the whole point
>>> of this to automate it? Creating greedy materialized views is usually not
>>> terribly difficult now, but you do have to write the triggers.
>>
>> Yeah, I agree.
>
> It doesn't accomplish anything interesting on its own. But if you do the
> planner changes to automatically use the materialized view to satisfy
> queries (item 2. in my previous email), it's useful.

But you can't do that with a snapshot view, only a continuous updated one.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Florian G. Pflug" on 11 Apr 2010 22:13

On 11.04.10 20:47 , Robert Haas wrote:
> On Sun, Apr 11, 2010 at 10:26 AM, Heikki Linnakangas
> <heikki.linnakangas(a)enterprisedb.com> wrote:
>> Robert Haas wrote:
>>> 2010/4/10 Andrew Dunstan<andrew(a)dunslane.net>:
>>>> Heikki Linnakangas wrote:
>>>>> 1. Keep the materialized view up-to-date when the base tables
>>>>> change. This can be further divided into many steps, you can
>>>>> begin by supporting automatic updates only on very simple
>>>>> views with e.g a single table and a where clause. Then extend
>>>>> that to support joins, aggregates, subqueries etc. Keeping it
>>>>> really limited, you could even require the user to write the
>>>>> required triggers himself.
>>>> That last bit doesn't strike me as much of an advance. Isn't
>>>> the whole point of this to automate it? Creating greedy
>>>> materialized views is usually not terribly difficult now, but
>>>> you do have to write the triggers.
>>>
>>> Yeah, I agree.
>>
>> It doesn't accomplish anything interesting on its own. But if you
>> do the planner changes to automatically use the materialized view
>> to satisfy queries (item 2. in my previous email), it's useful.
>
> But you can't do that with a snapshot view, only a continuous updated
> one.

If continuous updates prove to be too hard initially, you could instead
update the view on select if it's outdated. Such a materialized view
would be a kind of inter-session cache for subselects.

The hard part would probably be to figure out how to decide whether the
view is outdated or not, and to deal with two concurrent transactions
trying to use an outdates view (and both trying to refresh it). What
makes the second problem hard is that you wouldn't want one of the
transactions to wait for the other to complete, because this is not how
SELECTs traditionally behave.

best regards, Florian Pflug

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: [HACKERS] testing hot standby
Next: [HACKERS] testing HS/SR - 1 vs 2 performance