GSoC - proposal - Materialized Views in PostgreSQL [PgSql]

Prev: [HACKERS] testing hot standby
Next: [HACKERS] testing HS/SR - 1 vs 2 performance

From: Greg Smith on 12 Apr 2010 18:46

Josh Berkus wrote:
> What would be the use case for (1) by itself?
>

There isn't any use case for just working on the infrastructure, just
like there's no use case for "Syntax for partitioning" on its own. That
why people rarely work on that part of these problems--it's boring and
produces no feature of value on its own. I believe that in both cases,
attempts to build the more complicated parts, ones that don't first
address some of the core infrastructure first, will continue to produce
only prototypes.

I don't want to see Materialized Views wander down the same path as
partitioning, where lots of people produce "fun parts" patches, while
ignoring the grunt work of things like production quality catalog
support for the feature. I think Pavel's proposal got that part right
by starting with the grammar and executor setup trivia. And Robert's
comments about the details in that area it's easy to forget about hit
the mark too.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.us

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Josh Berkus on 12 Apr 2010 21:31

> I don't want to see Materialized Views wander down the same path as
> partitioning, where lots of people produce "fun parts" patches, while
> ignoring the grunt work of things like production quality catalog
> support for the feature. I think Pavel's proposal got that part right
> by starting with the grammar and executor setup trivia. And Robert's
> comments about the details in that area it's easy to forget about hit
> the mark too.

Good point. And GSoC may be one of the few times we can get people to
do that kind of work. Other than Simon, of course. ;-)

I just worry about any feature which doesn't get as far as a
user-visible implementation. If someone doesn't do the rest of the
parts soon, such features tend to atrophy because nobody is using them.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on 13 Apr 2010 01:20

Josh Berkus wrote:
> I just worry about any feature which doesn't get as far as a
> user-visible implementation. If someone doesn't do the rest of the
> parts soon, such features tend to atrophy because nobody is using them.
>

While they're limited, there are complexly viable prototype quality
implementations possible here without a large amount of work to get them
started. I'm not worried too much about this feature being unused. As
I was just reminded when assembling an page on the wiki about it:
http://wiki.postgresql.org/wiki/Materalized_Views it's currently ranked
#1--by a large margin--on the UserVoice feature request survey that
Peter kicked off.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.us

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 13 Apr 2010 16:00

Josh Berkus <josh(a)agliodbs.com> writes:
> There are basically 2 major parts for materialized views:

> A) Planner: Getting the query planner to swap in the MatView for part of
> a query automatically for query plan portions which the MatView supports;

> B) Maintenance: maintaining the MatView data according to the programmed
> scheme (synch, asynch, periodic).

> I do not believe it is possible to do both of the above in one summer.
> Of the two, (A) would be more useful since it is possible to manually
> implement (B) using triggers, queues and cron jobs today.

A patch that implements only (A) will be DOA. The reason is that the
planner can *never* "swap in a MatView" on its own authority, unless it
can prove that this does not change the semantics of the query. Which
it obviously will be unable to do unless there's a fully transparent
continuous-update scheme in place.

So the correct approach is to work on (B) first. When and if we get to
a sufficiently transparent update implementation, we can think about
changing the planner.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Pavel on 20 Apr 2010 13:59

Greg Smith wrote:

> pavelbaros wrote:
>> I am also waiting for approval for my repository named
>> "materialized_view" on git.postgresql.org, so I could publish
>> completed parts.
>
> Presuming that you're going to wander there and get assigned what
> looks like an official repo name for this project is a
> bit...optimistic. I would recommend that you publish to something
> like github instead (you can fork http://github.com/postgres/postgres
> ), and if the work looks good enough that it gets picked up by the
> community maybe you migrate it onto the main site eventually.
> git.postgresql.org is really not setup to be general hosting space for
> everyone who has a PostgreSQL related project; almost every repo on
> there belongs to someone who has already been a steady project
> contributor for a number of years.

Yes, you're true, I'm kind of newbe in this kind of project and
specially in PostgreSQL. But I think it is best way to get into
PostgreSQL. When I chose my bachelor thesis I did not know I could
participate GSoC or try to make it commitable. Anyway I will make repo
on github, so everybody could look at it, as soon as posible.

<http://github.com/pbaros/postgres>
>
> (Switching to boilerplate mode for a paragraph...) You have picked a
> PostgreSQL feature that is dramatically more difficult than it appears
> to be, and I wouldn't expect you'll actually finish even a fraction of
> your goals in a summer of work. You're at least in plentiful
> company--most students do the same. As a rule, if you see a feature
> on our TODO list that looks really useful and fun to work on, it's
> only still there because people have tried multiple times to build it
> completely but not managed to do so because it's harder than it
> appears. This is certainly the case with materialized views.
>
> You've outlined a reasonable way to build a prototype that does a
> limited implementation here. The issue is what it will take to extend
> that into being production quality for the real-world uses of
> materialized views. How useful your prototype is depends on how well
> it implements a subset of that in a way that will get used by the
> final design.
>
> The main hidden complexity in this particular project relates to
> handling view refreshes. The non-obvious problem is that when the
> view updates, you need something like a SQL MERGE to really handle
> that in a robust way that doesn't conflict with concurrent access to
> queries against the materialized view. And work on MERGE support is
> itself blocked behind the fact that PostgreSQL doesn't have a good way
> to lock access to a key value that doesn't exist yet--what other
> databases call key range locking. See the notes for "Add SQL-standard
> MERGE/REPLACE/UPSERT command" at http://wiki.postgresql.org/wiki/Todo
> for more information.
>
> You can work around that to build a prototype by grabbing a full table
> lock on the materialized view when updating it, but that's not a
> production quality solution. Solving that little detail is actually
> more work than the entire project you've outlined. Your suggested
> implementation--"In function CloseIntoRel executor swap relfilenode's
> of temp table and original table and finally delete temp table"--is
> where the full table lock is going to end up at. The exact use cases
> that need materialized views cannot handle a CLUSTER-style table
> recreation each time that needs an exclusive lock to switchover, so
> that whole part of your design is going to be a prototype that doesn't
> work at all like what needs to get built to make this feature
> committable. It's also not a reasonable assumption that you have
> enough disk space to hold a second copy of the MV in a production system.

For now I know it is not commitable in actual state, but for my thesis
it is enough and I know it will not be commitable with this design at
all. In case of GSoC it will depends on the time I will be able to spend
on it, if I will consider some other design.

>
> Once there's a good way to merge updates, how to efficiently generate
> them against the sort of large data sets that need materalized
> views--so you just write out the updates rather than a whole new
> copy--is itself a large project with a significant quantity of
> academic research to absorb before starting. Dan Colish at Portland
> State has been playing around with prototypes for the specific problem
> of finding a good algorithm for view refreshing that is compatible
> with PostgreSQL's execution model. He's already recognized the table
> lock issue here and for the moment is ignoring that part. I don't
> have a good feel yet for how long the targeted update code will take
> to mature, but based on what I do know I suspect that little detail is
> also a larger effort than the entire scope you're envisioning.
> There's a reason why the MIT Press compendium "Materialized Views:
> Techniques, Implementations, and Applications" is over 600 pages
> long--I hope you've already started digging through that material.

I would like to start to dig through that, but I'm in a hurry now. I
already have made a small research on MV as part of my thesis. I also
plan to continue study PostgreSQL and Materialized Views more into the
depth, preferably as my master thesis. But I realize MV feature
commitable to PostgreSQL is not project for one person, of course.

>
> Now, with all that said, that doesn't mean there's not a useful
> project for you buried in this mess. The first two steps in your plan:
>
> 1) create materialized view
> 2) change rewriter
>
> Include building a prototype grammer, doing an initial executor
> implementation, and getting some sort of rewriter working. That is
> potentially good groundwork to lay here. I would suggest that you
> completely drop your step 3:
>
> 3) create command that takes snapshot (refresh MV)
>
> Because you cannot built that in a way that will be useful (and by
> that I mean committable quality) until there's a better way to handle
> updates than writing a whole new table and grabbing a full relation
> lock to switch to it. To do a good job just on the first two steps
> should take at least a whole summer anyway--there's a whole stack of
> background research needed I haven't seen anyone do yet, and that
> isn't on your plan yet. There is a precedent for taking this
> approach. After getting stalled trying to add the entirety of easy
> partitioning to PostgreSQL, the current scope has been scaled back to
> just trying to get the syntax and on-disk structure right, then finish
> off the implementation. See
> http://wiki.postgresql.org/wiki/Table_partitioning to get an idea how
> that's been broken into those two major chunks.
>

Anyway thanks for all of your advices and help.

best regards,
Pavel Baros

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: [HACKERS] testing hot standby
Next: [HACKERS] testing HS/SR - 1 vs 2 performance