Prev: [COMMITTERS] pgsql: Oops, don't forget to rewind the directory before scanning it to
Next: Time travel on the buildfarm
From: Heikki Linnakangas on 23 Feb 2010 10:11 Andrew Dunstan wrote: > Gokulakannan Somasundaram wrote: >> I looked at the postgres nbtree code. From my analysis(which >> might be wrong!), we can implement IOTs, provided we make a decision >> on broken data types issue. > > I am not familiar with this term "broken data types", and I just looked > for it in the source code and couldn't find it. > > What exactly are you referring to? I believe he's referring to the fact that once a key is inserted to an index, it might not be possible to re-find it, if the datatype is broken in such a way that e.g comparison operator returns a different value. For example, today "1 < 2" returns true, but tomorrow it returns false. The decision on that is that you need to deal with it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Kevin Grittner" on 23 Feb 2010 10:18 Simon Riggs <simon(a)2ndQuadrant.com> wrote: > On Mon, 2010-02-22 at 08:51 +0200, Heikki Linnakangas wrote: >> Gokulakannan Somasundaram wrote: > >> > May i get a little clarification on this issue? Will we be >> > supporting the IOT feature in postgres in future? >> >> What seems like the best path to achieve the kind of performance >> benefits that IOTs offer is allowing index-only-scans using the >> visibility map. > > I don't agree with that. Could you explain why you think that > would be the case? It would be a shame to have everybody think you > can solve a problem if it turned out not to be the case. I'd like to be clear on what feature we're discussing. There has been mention of an organization where there is no heap per se, but all columns are stored in the leaf node of one of the table's indexes (which is the structure referred to as a CLUSTERED INDEX in some other popular products). There has been some mention of storing some of the data out-of-line, which could be considered to be already covered by TOAST. I know that one of the things which makes this technique particularly effective with such things as name columns for a clustered index is that these other products store index entries after the first in a page with a length that matches the previous entry and the differing data at the tail, which we don't yet have. Clearly it's not trivial, but there are certainly cases where it can be a big performance win. Besides the obvious issues around having a relation which functions like both an index and a heap (at the leaf level), there are the details of having other indexes point to these leaf nodes, creating and dropping clustered indexes, impact on vacuums, etc. Situations where clustered indexes tended to help: (1) Most access through a particular index -- often one less random read per access. (2) Frequent sequential access through a range of values in an index -- turn random access into mostly sequential. (3) Index values comprise a large portion of each tuple -- avoid redundant storage, reducing disk footprint, thereby improving cache hits. Points 1 and 2 could be covered to some degree by index-only scans, particularly if additional columns are added to indexes to make them "covering indexes". Index-only scans don't help with 3 at all; in fact, adding the additional columns to indexes to allow that optimization tends to work against it. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Simon Riggs on 23 Feb 2010 10:28 On Tue, 2010-02-23 at 17:08 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Mon, 2010-02-22 at 08:51 +0200, Heikki Linnakangas wrote: > >> Gokulakannan Somasundaram wrote: > > > >>> May i get a little clarification on this issue? Will we be supporting > >>> the IOT feature in postgres in future? > >> What seems like the best path to achieve the kind of performance > >> benefits that IOTs offer is allowing index-only-scans using the > >> visibility map. > > > > I don't agree with that. Could you explain why you think that would be > > the case? It would be a shame to have everybody think you can solve a > > problem if it turned out not to be the case. > > I'm thinking of a scan based on the index key. With an > index-organised-table, you can skip the heap access because the heap and > the index are the same structure. An index-only-scan likewise allows you > to skip the heap access. > > I grant you that an index-organised-table can have other benefits, like > reduced disk space usage (which is good cache efficiency), or less > random I/O required for updates. > > The question was if PostgreSQL will be supporting index-organised-tables > in the future. The answer is "not in the foreseeable future". No-one has > come up with a plausible plan for how to do it, and no-one working on it > at the moment. I think Gokul was asking because he wanted to work on it, but wanted to check community approval first. > I don't want to discourage thinking about pie-in-the-sky features. Planning, is what I would call that. Calling them "pie in the sky" is just a negative label, as much as if someone else called them "obvious next steps" is a positive label. > There's many tricks like column-oriented storage, compression, > index-organised-tables etc. that would be nice to have. Whether any > particular feature is worthwhile in the end, the devil is in the details. I agree that the way to improve things is to focus on a particular architectural technique and then a design for doing that. Going straight to the design and naming it doesn't help at all. That was why I named an earlier project "Frequent Update Optimisation" rather than any of the names that referred to a design. The devil is in the details, I agree. The important part is analysis though, not coding. Which is why I was asking why you were working on index-only scans, though do not doubt your ability to make them work. And also why I would say to Gokul: the right approach isn't to ask "will we be supporting IOTs" and then go and build them. The right approach is to work out what you want to improve and give a clear justification of why, come up with a proposal to do that with analysis of how the proposal will improve the situation and then think about coding. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Gokulakannan Somasundaram on 24 Feb 2010 03:20 > > > I think Gokul was asking because he wanted to work on it, but wanted to > check community approval first. > Yes the problem is that we need to come to a consensus on broken data types. As Heikki pointed out, those data types, which is based on a unstable function like time, date, random etc. This is definitely a theoretical possibility, but still we want to continue building indexes which supports these features. If we can take a decision regarding this, we can have a feature like IOT.. > > > > There's many tricks like column-oriented storage, compression, > > index-organised-tables etc. that would be nice to have. Whether any > > particular feature is worthwhile in the end, the devil is in the details. > > Please consider my following statements from a database tuner perspective. I don't want to discourage the visibility map feature, but it has the disadvantages, which we already discussed. While i do a explain analyze and i see 300 reads, but the same query in production might lead to 400 reads(with all the extra 100 being random i/os), because of the state of the visibility map. If there is a long batch job running somewhere in the database, it will affect almost all the visibility maps of the relation. So how can a person, tune and test a query in dev and put it in production and be confident about the i/o performance ? Say Visibility map goes into core after 9.x, the performance of those databases will be less compared to the previous release in these circumstances. All i am trying to say is that the visibility map has cases, where it will be ineffective and are we deciding to find solutions in those cases. Thanks, Gokul.
From: Simon Riggs on 24 Feb 2010 03:30
On Wed, 2010-02-24 at 13:50 +0530, Gokulakannan Somasundaram wrote: > Please consider my following statements from a database tuner > perspective. I don't want to discourage the visibility map feature, > but it has the disadvantages, which we already discussed. While i do a > explain analyze and i see 300 reads, but the same query in production > might lead to 400 reads(with all the extra 100 being random i/os), > because of the state of the visibility map. If there is a long batch > job running somewhere in the database, it will affect almost all the > visibility maps of the relation. So how can a person, tune and test a > query in dev and put it in production and be confident about the i/o > performance ? Say Visibility map goes into core after 9.x, the > performance of those databases will be less compared to the previous > release in these circumstances. I would add that both Heikki and Greg Stark have argued at length that the visibility map cannot be relied upon in production systems. Those arguments were deployed when considering the use of the VM for partitioning, yet they apply equally to use of the VM in other contexts. The fragility there is not an issue in a mostly read-only application, but it definitely would be a concern in other cases. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |