From: "Kevin Grittner" on 9 Apr 2010 16:53 pavelbaros <baros.p(a)seznam.cz> wrote: > I am also waiting for approval for my repository named > "materialized_view" on git.postgresql.org They seem to prefer that you get a repository under your name and use materialized_view as a branch name. See my account on git.postgresql.org and its serializable branch for an example. I learned by putting in a request similar to your pending one. ;-) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 9 Apr 2010 18:05 2010/4/9 pavelbaros <baros.p(a)seznam.cz>: > Implementation: could be divided to few steps: > > 1) create materialized view > - modify grammar (parser): CREATE MATERIALIZED VIEW mv_name AS SELECT ... > - change executor, so that it will create physical table defined by select > statement This basically needs to work the same was as CREATE TABLE ... AS SELECT ... - save that it should also stuff the rewritten query someplace, so that it can be re-executed. I think one of the important design questions here is figuring out exactly where that "someplace" should be. I also suspect that we want to block any write access to the relation except for view refreshes. IOW, INSERT, UPDATE, and DELETE on the underlying relation should be rejected (though perhaps rewrite rules redirecting such operations to other tables could be allowed). > 2) change rewriter > - usually, view is relation with defined rule and when rewriting, rule is > fired and relation (view) is replaced by definition of view. If relation do > not have rule, planner and executor behave to it as physical table > (relation). In case of materialized view we want to rewrite select statement > only in case when we refreshing MV. In other cases rewriter should skip > rewriting and pick up physical relation. Exclude situation when other > rewrite rules which are not related to MV definition are specified. > > 3) create command that takes snapshot (refresh MV) > - modify grammar: ALTER MATERIALIZED VIEW mv_name REFRESH; > - taking snapshot (refreshing) is similar to command "SELECT INTO ..." and I > decided to follow the way it works. After parsing query and before > transformation is MANUALLY created tree representation of "SELECT * INTO > ..." with flag IntoClause->isrefresh set true, indicating it is refreshing > materialized view. Everithing acts as it would be regular "SELECT INTO ...." > except functions OpenIntoRel() and CloseIntoRel(). In function OpenIntoRel > is created temp table (without catalog) and set as destination for result of > select. In function CloseIntoRel executor swap relfilenode's of temp table > and original table and finally delete temp table. Behavior of CloseIntoRel > function is inspired by CLUSTER statement. I'll have to read the code before I can comment on the rest of this in detail. ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Smith on 9 Apr 2010 23:53 pavelbaros wrote: > I am also waiting for approval for my repository named > "materialized_view" on git.postgresql.org, so I could publish > completed parts. Presuming that you're going to wander there and get assigned what looks like an official repo name for this project is a bit...optimistic. I would recommend that you publish to something like github instead (you can fork http://github.com/postgres/postgres ), and if the work looks good enough that it gets picked up by the community maybe you migrate it onto the main site eventually. git.postgresql.org is really not setup to be general hosting space for everyone who has a PostgreSQL related project; almost every repo on there belongs to someone who has already been a steady project contributor for a number of years. (Switching to boilerplate mode for a paragraph...) You have picked a PostgreSQL feature that is dramatically more difficult than it appears to be, and I wouldn't expect you'll actually finish even a fraction of your goals in a summer of work. You're at least in plentiful company--most students do the same. As a rule, if you see a feature on our TODO list that looks really useful and fun to work on, it's only still there because people have tried multiple times to build it completely but not managed to do so because it's harder than it appears. This is certainly the case with materialized views. You've outlined a reasonable way to build a prototype that does a limited implementation here. The issue is what it will take to extend that into being production quality for the real-world uses of materialized views. How useful your prototype is depends on how well it implements a subset of that in a way that will get used by the final design. The main hidden complexity in this particular project relates to handling view refreshes. The non-obvious problem is that when the view updates, you need something like a SQL MERGE to really handle that in a robust way that doesn't conflict with concurrent access to queries against the materialized view. And work on MERGE support is itself blocked behind the fact that PostgreSQL doesn't have a good way to lock access to a key value that doesn't exist yet--what other databases call key range locking. See the notes for "Add SQL-standard MERGE/REPLACE/UPSERT command" at http://wiki.postgresql.org/wiki/Todo for more information. You can work around that to build a prototype by grabbing a full table lock on the materialized view when updating it, but that's not a production quality solution. Solving that little detail is actually more work than the entire project you've outlined. Your suggested implementation--"In function CloseIntoRel executor swap relfilenode's of temp table and original table and finally delete temp table"--is where the full table lock is going to end up at. The exact use cases that need materialized views cannot handle a CLUSTER-style table recreation each time that needs an exclusive lock to switchover, so that whole part of your design is going to be a prototype that doesn't work at all like what needs to get built to make this feature committable. It's also not a reasonable assumption that you have enough disk space to hold a second copy of the MV in a production system. Once there's a good way to merge updates, how to efficiently generate them against the sort of large data sets that need materalized views--so you just write out the updates rather than a whole new copy--is itself a large project with a significant quantity of academic research to absorb before starting. Dan Colish at Portland State has been playing around with prototypes for the specific problem of finding a good algorithm for view refreshing that is compatible with PostgreSQL's execution model. He's already recognized the table lock issue here and for the moment is ignoring that part. I don't have a good feel yet for how long the targeted update code will take to mature, but based on what I do know I suspect that little detail is also a larger effort than the entire scope you're envisioning. There's a reason why the MIT Press compendium "Materialized Views: Techniques, Implementations, and Applications" is over 600 pages long--I hope you've already started digging through that material. Now, with all that said, that doesn't mean there's not a useful project for you buried in this mess. The first two steps in your plan: 1) create materialized view 2) change rewriter Include building a prototype grammer, doing an initial executor implementation, and getting some sort of rewriter working. That is potentially good groundwork to lay here. I would suggest that you completely drop your step 3: 3) create command that takes snapshot (refresh MV) Because you cannot built that in a way that will be useful (and by that I mean committable quality) until there's a better way to handle updates than writing a whole new table and grabbing a full relation lock to switch to it. To do a good job just on the first two steps should take at least a whole summer anyway--there's a whole stack of background research needed I haven't seen anyone do yet, and that isn't on your plan yet. There is a precedent for taking this approach. After getting stalled trying to add the entirety of easy partitioning to PostgreSQL, the current scope has been scaled back to just trying to get the syntax and on-disk structure right, then finish off the implementation. See http://wiki.postgresql.org/wiki/Table_partitioning to get an idea how that's been broken into those two major chunks. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg(a)2ndQuadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 10 Apr 2010 00:32 2010/4/9 Greg Smith <greg(a)2ndquadrant.com>: > The main hidden complexity in this particular project relates to handling > view refreshes. The non-obvious problem is that when the view updates, you > need something like a SQL MERGE to really handle that in a robust way that > doesn't conflict with concurrent access to queries against the materialized > view. And work on MERGE support is itself blocked behind the fact that > PostgreSQL doesn't have a good way to lock access to a key value that > doesn't exist yet--what other databases call key range locking. See the > notes for "Add SQL-standard MERGE/REPLACE/UPSERT command" at > http://wiki.postgresql.org/wiki/Todo for more information. > > You can work around that to build a prototype by grabbing a full table lock > on the materialized view when updating it, but that's not a production > quality solution. Solving that little detail is actually more work than the Hmm... I am not sure you're right about this. It's not obvious to me that a brief full-table lock wouldn't be acceptable for an initial implementation. Obviously it wouldn't be suitable for every use case but since we're talking about manually refreshed views that was bound to be true anyway. ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Heikki Linnakangas on 10 Apr 2010 03:07 Greg Smith wrote: > The main hidden complexity in this particular project relates to > handling view refreshes. The non-obvious problem is that when the view > updates, you need something like a SQL MERGE to really handle that in a > robust way that doesn't conflict with concurrent access to queries > against the materialized view. And work on MERGE support is itself > blocked behind the fact that PostgreSQL doesn't have a good way to lock > access to a key value that doesn't exist yet--what other databases call > key range locking. See the notes for "Add SQL-standard > MERGE/REPLACE/UPSERT command" at http://wiki.postgresql.org/wiki/Todo > for more information. > > You can work around that to build a prototype by grabbing a full table > lock on the materialized view when updating it, but that's not a > production quality solution. It would still be useful for many applications. And it would provide a basis to extend later. You don't need to solve all problems at once, as long as what you implement is a useful subset. > Now, with all that said, that doesn't mean there's not a useful project > for you buried in this mess. The first two steps in your plan: > > 1) create materialized view > 2) change rewriter > > Include building a prototype grammer, doing an initial executor > implementation, and getting some sort of rewriter working. That is > potentially good groundwork to lay here. I would suggest that you > completely drop your step 3: > > 3) create command that takes snapshot (refresh MV) > > Because you cannot built that in a way that will be useful (and by that > I mean committable quality) until there's a better way to handle updates > than writing a whole new table and grabbing a full relation lock to > switch to it. To do a good job just on the first two steps should take > at least a whole summer anyway--there's a whole stack of background > research needed I haven't seen anyone do yet, and that isn't on your > plan yet. There is a precedent for taking this approach. After getting > stalled trying to add the entirety of easy partitioning to PostgreSQL, > the current scope has been scaled back to just trying to get the syntax > and on-disk structure right, then finish off the implementation. See > http://wiki.postgresql.org/wiki/Table_partitioning to get an idea how > that's been broken into those two major chunks. The good thing about this subject for GSoC is that it can be divided into many small steps. There's two largely independent main parts: 1. Keep the materialized view up-to-date when the base tables change. This can be further divided into many steps, you can begin by supporting automatic updates only on very simple views with e.g a single table and a where clause. Then extend that to support joins, aggregates, subqueries etc. Keeping it really limited, you could even require the user to write the required triggers himself. 2. Teach the planner to use materialized views automatically when a query references the base tables. So if you issue the query "SELECT * FROM table WHERE foo > 10 AND bar = 10", and there's a materialized view on "SELECT * FROM table WHERE bar = 10", the planner can transform the original query into "SELECT * FROM materializedview WHERE foo > 10". This largely depends on 1, although some DBMSs offer the option to use manually refreshed materialized views too, knowing that they might not be completely up-to-date. There's a lot room to choose which problems you want to tackle, which is good for a summer-of-code project. Your proposal basically describes doing 1, in a limited fashion where the view is not updated automatically, but only when the DBA runs a command to refresh it. I'm not sure if that's useful enough on its own, writing "CREATE MATERIALIZED VIEW ... SELECT ..." doesn't seem any easier than just writing "CREATE TABLA AS ...". But if you can do something about 2, or even a very limited part of 1, keeping the view up-to-date automatically, it becomes much more useful. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
|
Next
|
Last
Pages: 1 2 3 4 5 6 7 Prev: [HACKERS] testing hot standby Next: [HACKERS] testing HS/SR - 1 vs 2 performance |