GSoC - proposal - Materialized Views in PostgreSQL [PgSql]

Prev: [HACKERS] testing hot standby
Next: [HACKERS] testing HS/SR - 1 vs 2 performance

From: Andrew Dunstan on 10 Apr 2010 09:15

Heikki Linnakangas wrote:
>
> 1. Keep the materialized view up-to-date when the base tables change.
> This can be further divided into many steps, you can begin by supporting
> automatic updates only on very simple views with e.g a single table and
> a where clause. Then extend that to support joins, aggregates,
> subqueries etc. Keeping it really limited, you could even require the
> user to write the required triggers himself.
>
>

That last bit doesn't strike me as much of an advance. Isn't the whole
point of this to automate it? Creating greedy materialized views is
usually not terribly difficult now, but you do have to write the triggers.

The other thing that could be interesting about this would be some
scheme for lazy refresh that didn't involve re-extracting the whole data
set.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on 10 Apr 2010 09:18

2010/4/10 Andrew Dunstan <andrew(a)dunslane.net>:
> Heikki Linnakangas wrote:
>>
>> 1. Keep the materialized view up-to-date when the base tables change.
>> This can be further divided into many steps, you can begin by supporting
>> automatic updates only on very simple views with e.g a single table and
>> a where clause. Then extend that to support joins, aggregates,
>> subqueries etc. Keeping it really limited, you could even require the
>> user to write the required triggers himself.
>
> That last bit doesn't strike me as much of an advance. Isn't the whole point
> of this to automate it? Creating greedy materialized views is usually not
> terribly difficult now, but you do have to write the triggers.

Yeah, I agree.

> The other thing that could be interesting about this would be some scheme
> for lazy refresh that didn't involve re-extracting the whole data set.

One way to do this would be to infer a primary key for the result set
based on the input query. But I think we don't really have the
infrastructure to do this right now, so not really a project for a
beginner.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on 10 Apr 2010 13:13

Greg Smith wrote:

> And work on MERGE support is itself blocked behind the fact that
> PostgreSQL doesn't have a good way to lock access to a key value
> that doesn't exist yet--what other databases call key range
> locking.

The bulk of the serializable implementation WIP is work to implement
just this sort of locking. There are already a couple possible
spin-off uses on the horizon based on the ability of these locks to
survive their initiating transactions and detect conflicting writes.
Both spinoffs involve somehow flagging a transaction as being one for
which the locks should be kept until further notice, and issuing a
notification when a conflicting write occurs. That seems consistent
with the needs of materialized views, too.

It probably won't be solid in time to be useful for GSoC, but if
someone's looking to map out a plan for materialized views, I thought
this information might be germane.

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on 10 Apr 2010 23:03

Heikki Linnakangas wrote:
> Your proposal basically describes
> doing 1, in a limited fashion where the view is not updated
> automatically, but only when the DBA runs a command to refresh it. I'm
> not sure if that's useful enough on its own, writing "CREATE
> MATERIALIZED VIEW ... SELECT ..." doesn't seem any easier than just
> writing "CREATE TABLA AS ...". But if you can do something about 2, or
> even a very limited part of 1, keeping the view up-to-date
> automatically, it becomes much more useful.
>

You've hit upon the core issue here. You can build materialized views
right now using "CREATE TABLE AS". You can even update them by creating
a new table the same way, with a new name, and doing the
LOCK/RENAME/DROP shuffle--what people used to do for rewriting bloated
tables before there was CLUSTER. The first step in the proposal here is
essentially syntax to give an easier UI for that. That's an interesting
step, but recognize that it doesn't actually provide anything you can't
do already.

If you then note that doing any sort of incremental update to the view
is a hard problem, and that a lot of the useful cases for materialized
views involve tables where it's impractical to recreate the whole thing
anyway, you'll inevitably find yourself deeply lost in the minutia of
how to handle the updates. It's really the core problem in building
what people expect from a materialized view implementation in a serious
database. Chipping away at the other pieces around it doesn't move the
feature that far forward, even if you get every single one of them
except incremental updates finished, because everything else combined is
still not that much work in comparison to the issues around updates.

There certainly are a fair number of subproblems you can break out of
here. I just think it's important to recognize that the path that leads
to a useful GSoC project and the one that gives a production quality
materialized view implementation may not have that much in common, and
to manage expectations on both sides accordingly. If Pavel thinks he's
going to end up being able to say "I added materialized views to
PostgreSQL" at the end of the summer, that's going to end in
disappointment. And if people think this project plan will lead to
being able to claim PostgreSQL now has this feature, that's also not
going to go well. If the scope is "add initial grammar and rewriting
moving toward a future materialized view feature", which the underlying
implementation noted as a stub prototype, that might work out OK. This
is why I likened it to the work on "Syntax for partitioning", which has
a similarly focused subgoal structure.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.us

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on 10 Apr 2010 23:40

Robert Haas wrote:
> It's not obvious to me
> that a brief full-table lock wouldn't be acceptable for an initial
> implementation. Obviously it wouldn't be suitable for every use case
> but since we're talking about manually refreshed views that was bound
> to be true anyway.
>

There already is an initial implementation of sorts. There are a couple
of ways you can build these right now, so any new development has to
look like it will end with good odds of being an improvement over what's
already available before it's worth putting development resources into.

As a rough idea of what people want these for in the field, based on
what I've seen requests for, imagine that someone has a 1TB table
they're materializing a view on in order to get at least a 10:1, and
hopefully close to a 100:1, speedup on viewing summary data. Now,
picture what happens if you have someone doing a sequential scan on the
MV, which is still quite big, the updater process lines up to grab an
exclusive lock when it's done, and now a second user wanting to read a
single row quickly comes along behind it. Given a full-table lock
implementation, that scenario is unlikely to play out with the second
user getting a fast response. They'll likely sit in a lock queue for
some potentially long period of time instead, waiting for the active seq
scan to finish then the update to happen. You have to build it that way
or a steady stream of people reading could block out updates forever.

To be frank, that makes for a materalized view implementation of little
value over what you can currently do as far as I'm concerned. It might
be interesting as a prototype, but that's not necessarily going to look
like what's needed to do this for real at all. I'm not a big fan of
dumping work into projects when you can see exactly how it's going to
fail before you even get started. As I see if, if you know where it's
going to fall down, you don't need to build a prototype as an exercise
to show you how to build it--you should work on that part first instead.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.us

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: [HACKERS] testing hot standby
Next: [HACKERS] testing HS/SR - 1 vs 2 performance