From: Greg Stark on 19 Oct 2009 19:39 On Mon, Oct 19, 2009 at 2:54 PM, Kevin Grittner <Kevin.Grittner(a)wicourts.gov> wrote: > How about calculating an effective percentage based on other > information. effective_cache_size, along with relation and database > size, come to mind. I think previous proposals for this have fallen down when you actually try to work out a formula for this. The problem is that you could have a table which is much smaller than effective_cache_size but is never in cache due to it being one of many such tables. I think it would still be good to have some naive kind of heuristic here as long as it's fairly predictable for DBAs. But the long-term strategy here I think is to actually have some way to measure the real cache hit rate on a per-table basis. Whether it's by timing i/o operations, programmatic access to dtrace, or some other kind of os interface, if we could know the real cache hit rate it would be very helpful. Perhaps we could extrapolate from the shared buffer cache percentage. If there's a moderately high percentage in shared buffers then it seems like a reasonable supposition to assume the filesystem cache would have a similar distribution. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Robert Haas on 19 Oct 2009 20:29 On Mon, Oct 19, 2009 at 5:54 PM, Kevin Grittner <Kevin.Grittner(a)wicourts.gov> wrote: > Robert Haas <robertmhaas(a)gmail.com> wrote: > >> I've been wondering if it might make sense to have a >> "random_page_cost" and "seq_page_cost" setting for each TABLESPACE, >> to compensate for the fact that different media might be faster or >> slower, and a percent-cached setting for each table over top of >> that. > > [after recovering from the initial cringing reaction...] > > How about calculating an effective percentage based on other > information. effective_cache_size, along with relation and database > size, come to mind. How about the particular index being considered > for the plan? Of course, you might have to be careful about working > in TOAST table size for a particular query, based on the columns > retrieved. I think that a per-tablespace page cost should be set by the DBA, same as we do with global page-costs now. OTOH, I think that a per-relation percent-in-cache should be automatically calculated by the database (somehow) and the DBA should have an option to override in case the database does the wrong thing. I gave a lightning talk on this topic at PGcon. > I have no doubt that there would be some major performance regressions > in the first cut of anything like this, for at least *some* queries. > The toughest part of this might be to get adequate testing to tune it > for a wide enough variety of real-life situations. Agreed. ....Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Jeff Davis on 19 Oct 2009 20:34 On Mon, 2009-10-19 at 16:39 -0700, Greg Stark wrote: > But the long-term strategy here I think is to actually have some way > to measure the real cache hit rate on a per-table basis. Whether it's > by timing i/o operations, programmatic access to dtrace, or some other > kind of os interface, if we could know the real cache hit rate it > would be very helpful. Maybe it would be simpler to just get the high-order bit: is this table likely to be completely in cache (shared buffers or os buffer cache), or not? The lower cache hit ratios are uninteresting: the performance difference between 1% and 50% is only a factor of two. The higher cache hit ratios that are lower than "almost 100%" seem unlikely: what kind of scenario would involve a stable 90% cache hit ratio for a table? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Jeff Janes on 19 Oct 2009 22:21 On Mon, Oct 19, 2009 at 4:29 PM, Greg Stark <gsstark(a)mit.edu> wrote: > On Mon, Oct 19, 2009 at 4:21 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote: >> marcin mank <marcin.mank(a)gmail.com> writes: >>>> This proposal is just "hints by the back door", ISTM. As Tom says, there is >>>> a justification for having it on tablespaces but not on individual tables. >> >>> If the parameter is defined as "the chance that a page is in cache" >>> there is very real physical meaning to it. >> >> We have no such parameter... > > > And we want our parameters to be things the DBA has a chance of being > able to estimate. Do the current parameters meet that standard? When setting seq_page_cost now, don't people have a lot of "Well, we're about this likely to find it in the cache anyway" built into their settings? Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Kevin Grittner" on 19 Oct 2009 22:22
Jeff Davis <pgsql(a)j-davis.com> wrote: > what kind of scenario > would involve a stable 90% cache hit ratio for a table? I'd bet accounts receivable applications often hit that. (Most payments on recent billings; a sprinkling on older ones.) I'm sure there are others. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |