From: Robert Haas on 11 Apr 2010 01:08 On Sat, Apr 10, 2010 at 11:40 PM, Greg Smith <greg(a)2ndquadrant.com> wrote: > To be frank, that makes for a materalized view implementation of little > value over what you can currently do as far as I'm concerned. It might be > interesting as a prototype, but that's not necessarily going to look like > what's needed to do this for real at all. I'm not a big fan of dumping work > into projects when you can see exactly how it's going to fail before you > even get started. As I see if, if you know where it's going to fall down, > you don't need to build a prototype as an exercise to show you how to build > it--you should work on that part first instead. Hopefully, you're already aware that I have enormous respect for your opinions on a wide variety of topics; if not, let me publicly say that I absolutely do. Having said that, I disagree with your conclusions in this instance. I see nothing but upside from this work. It is vastly easier to write a patch that builds on existing functionality than it is to write something new from scratch. If there's any value in having manually refreshed materialized views, then having the simplest possible implementation of what those can look like committed will make it far easier to plan out next steps. While the proposed implementation may not solve a huge number of real-world problems, I think there's a good argument that some people will get good use of it. Not everyone has 1TB tables with continuous access patterns. And, provided that it doesn't conflict with anything we want to do in the future, being useful to some people is a good enough reason to put it in. I also think that you're underestimating the number of problems that will have to be solved to get this done. It's going to take some significant work - both design work and coding work - to figure out how this should integrate into the rest of the system. (What should be the value of pg_class.relkind? Where should the node representation of the snapshot query be stored? And did we handle all of those OID dependencies correctly?) Where I can see this possibly falling down (other than being just too much work for a relative PostgreSQL novice to get it done in one summer) is if there are concerns about it being incompatible with incrementally-updated views. I imagine that we're going to want to eventually support both, so we need to make sure that this implementation doesn't box us into a corner. But as far as snapshot views go, complaining that the proposed locking is too strong doesn't seem quite fair. Fixing that, AFAICS, is a very hard project, possibly involving significant planner support and an implementation of MERGE, and I would much rather try to land a fundamentals patch like this first and then deal with the gyrations that will be involved in making this work than try to land the whole thing all at once. Of course, if I'm missing something, and there's a SIMPLE way to get materialized views that can be refreshed without a full-table lock, that's another story altogether - maybe you have an idea? Finally, even if we decided NOT to merge this patch because of the limitations you mention (and right now that doesn't seem to be the consensus), having this part of it completed as a starting point for future work might be reason enough by itself. In short: I think you may be letting the perfect be the enemy of the good. ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Smith on 11 Apr 2010 05:24 Robert Haas wrote: > I also think that you're underestimating the number of problems that > will have to be solved to get this done. It's going to take some > significant work - both design work and coding work - to figure out > how this should integrate into the rest of the system. (What should > be the value of pg_class.relkind? Where should the node > representation of the snapshot query be stored? And did we handle all > of those OID dependencies correctly?) > I don't think I'm underestimating all that, but I suspect Pavel is by a considerable amount. This is why I've been suggesting that a GSoC scope here might just be wrestling with this area of the problem for the whole summer--not even getting into updates beyond a completely trivial implementation, if any at all. Things like "handle OID dependencies" are definitely not on the fun side of the development work that people tend to think about in advance. > Where I can see this possibly falling down (other than being just too > much work for a relative PostgreSQL novice to get it done in one > summer) is if there are concerns about it being incompatible with > incrementally-updated views. I imagine that we're going to want to > eventually support both, so we need to make sure that this > implementation doesn't box us into a corner. Exactly my concern; comitting this part without knowing how that's later going to fit into place strikes me the sort of the thing this project doesn't like to do. The alternate approach of starting with the update machinery is less likely IMHO to get stuck wondering if there's a future blind spot coming or not, since you'd be building from the bottom up starting with the hardest parts. From the rest of your comments, I'm comfortable that you're in sync with the not necessarily obvious risky spots here I wanted to raise awareness of. It's unreasonable to expect we'll have exactly the same priorities here, and I doubt it's useful to debate how I perceive the merit of various development subsets here compared to yourself. I don't think it's really important whether anyone agrees with me or not about exactly the value of a full table lock implementation. The main thing I'm concerned about is just that it's noted as a known risky part, one that could end up blocking the project's ability to commit even a subset of the proposed patch here. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg(a)2ndQuadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Heikki Linnakangas on 11 Apr 2010 10:26 Robert Haas wrote: > 2010/4/10 Andrew Dunstan <andrew(a)dunslane.net>: >> Heikki Linnakangas wrote: >>> 1. Keep the materialized view up-to-date when the base tables change. >>> This can be further divided into many steps, you can begin by supporting >>> automatic updates only on very simple views with e.g a single table and >>> a where clause. Then extend that to support joins, aggregates, >>> subqueries etc. Keeping it really limited, you could even require the >>> user to write the required triggers himself. >> That last bit doesn't strike me as much of an advance. Isn't the whole point >> of this to automate it? Creating greedy materialized views is usually not >> terribly difficult now, but you do have to write the triggers. > > Yeah, I agree. It doesn't accomplish anything interesting on its own. But if you do the planner changes to automatically use the materialized view to satisfy queries (item 2. in my previous email), it's useful. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 11 Apr 2010 14:47 On Sun, Apr 11, 2010 at 10:26 AM, Heikki Linnakangas <heikki.linnakangas(a)enterprisedb.com> wrote: > Robert Haas wrote: >> 2010/4/10 Andrew Dunstan <andrew(a)dunslane.net>: >>> Heikki Linnakangas wrote: >>>> 1. Keep the materialized view up-to-date when the base tables change. >>>> This can be further divided into many steps, you can begin by supporting >>>> automatic updates only on very simple views with e.g a single table and >>>> a where clause. Then extend that to support joins, aggregates, >>>> subqueries etc. Keeping it really limited, you could even require the >>>> user to write the required triggers himself. >>> That last bit doesn't strike me as much of an advance. Isn't the whole point >>> of this to automate it? Creating greedy materialized views is usually not >>> terribly difficult now, but you do have to write the triggers. >> >> Yeah, I agree. > > It doesn't accomplish anything interesting on its own. But if you do the > planner changes to automatically use the materialized view to satisfy > queries (item 2. in my previous email), it's useful. But you can't do that with a snapshot view, only a continuous updated one. ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Florian G. Pflug" on 11 Apr 2010 22:13
On 11.04.10 20:47 , Robert Haas wrote: > On Sun, Apr 11, 2010 at 10:26 AM, Heikki Linnakangas > <heikki.linnakangas(a)enterprisedb.com> wrote: >> Robert Haas wrote: >>> 2010/4/10 Andrew Dunstan<andrew(a)dunslane.net>: >>>> Heikki Linnakangas wrote: >>>>> 1. Keep the materialized view up-to-date when the base tables >>>>> change. This can be further divided into many steps, you can >>>>> begin by supporting automatic updates only on very simple >>>>> views with e.g a single table and a where clause. Then extend >>>>> that to support joins, aggregates, subqueries etc. Keeping it >>>>> really limited, you could even require the user to write the >>>>> required triggers himself. >>>> That last bit doesn't strike me as much of an advance. Isn't >>>> the whole point of this to automate it? Creating greedy >>>> materialized views is usually not terribly difficult now, but >>>> you do have to write the triggers. >>> >>> Yeah, I agree. >> >> It doesn't accomplish anything interesting on its own. But if you >> do the planner changes to automatically use the materialized view >> to satisfy queries (item 2. in my previous email), it's useful. > > But you can't do that with a snapshot view, only a continuous updated > one. If continuous updates prove to be too hard initially, you could instead update the view on select if it's outdated. Such a materialized view would be a kind of inter-session cache for subselects. The hard part would probably be to figure out how to decide whether the view is outdated or not, and to deal with two concurrent transactions trying to use an outdates view (and both trying to refresh it). What makes the second problem hard is that you wouldn't want one of the transactions to wait for the other to complete, because this is not how SELECTs traditionally behave. best regards, Florian Pflug |