From: Greg Stark on
On Mon, Oct 19, 2009 at 2:54 PM, Kevin Grittner
<Kevin.Grittner(a)wicourts.gov> wrote:
> How about calculating an effective percentage based on other
> information.  effective_cache_size, along with relation and database
> size, come to mind.

I think previous proposals for this have fallen down when you actually
try to work out a formula for this. The problem is that you could have
a table which is much smaller than effective_cache_size but is never
in cache due to it being one of many such tables.

I think it would still be good to have some naive kind of heuristic
here as long as it's fairly predictable for DBAs.

But the long-term strategy here I think is to actually have some way
to measure the real cache hit rate on a per-table basis. Whether it's
by timing i/o operations, programmatic access to dtrace, or some other
kind of os interface, if we could know the real cache hit rate it
would be very helpful.

Perhaps we could extrapolate from the shared buffer cache percentage.
If there's a moderately high percentage in shared buffers then it
seems like a reasonable supposition to assume the filesystem cache
would have a similar distribution.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Robert Haas on
On Mon, Oct 19, 2009 at 5:54 PM, Kevin Grittner
<Kevin.Grittner(a)wicourts.gov> wrote:
> Robert Haas <robertmhaas(a)gmail.com> wrote:
>
>> I've been wondering if it might make sense to have a
>> "random_page_cost" and "seq_page_cost" setting for each TABLESPACE,
>> to compensate for the fact that different media might be faster or
>> slower, and a percent-cached setting for each table over top of
>> that.
>
> [after recovering from the initial cringing reaction...]
>
> How about calculating an effective percentage based on other
> information.  effective_cache_size, along with relation and database
> size, come to mind.  How about the particular index being considered
> for the plan?  Of course, you might have to be careful about working
> in TOAST table size for a particular query, based on the columns
> retrieved.

I think that a per-tablespace page cost should be set by the DBA, same
as we do with global page-costs now.

OTOH, I think that a per-relation percent-in-cache should be
automatically calculated by the database (somehow) and the DBA should
have an option to override in case the database does the wrong thing.
I gave a lightning talk on this topic at PGcon.

> I have no doubt that there would be some major performance regressions
> in the first cut of anything like this, for at least *some* queries.
> The toughest part of this might be to get adequate testing to tune it
> for a wide enough variety of real-life situations.

Agreed.

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jeff Davis on
On Mon, 2009-10-19 at 16:39 -0700, Greg Stark wrote:
> But the long-term strategy here I think is to actually have some way
> to measure the real cache hit rate on a per-table basis. Whether it's
> by timing i/o operations, programmatic access to dtrace, or some other
> kind of os interface, if we could know the real cache hit rate it
> would be very helpful.

Maybe it would be simpler to just get the high-order bit: is this table
likely to be completely in cache (shared buffers or os buffer cache), or
not?

The lower cache hit ratios are uninteresting: the performance difference
between 1% and 50% is only a factor of two. The higher cache hit ratios
that are lower than "almost 100%" seem unlikely: what kind of scenario
would involve a stable 90% cache hit ratio for a table?

Regards,
Jeff Davis


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Jeff Janes on
On Mon, Oct 19, 2009 at 4:29 PM, Greg Stark <gsstark(a)mit.edu> wrote:
> On Mon, Oct 19, 2009 at 4:21 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
>> marcin mank <marcin.mank(a)gmail.com> writes:
>>>> This proposal is just "hints by the back door", ISTM. As Tom says, there is
>>>> a justification for having it on tablespaces but not on individual tables.
>>
>>> If the parameter is defined as "the chance that a page is in cache"
>>> there is very real physical meaning to it.
>>
>> We have no such parameter...
>
>
> And we want our parameters to be things the DBA has a chance of being
> able to estimate.

Do the current parameters meet that standard? When setting
seq_page_cost now, don't people have a lot of "Well, we're about this
likely to find it in the cache anyway" built into their settings?


Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on
Jeff Davis <pgsql(a)j-davis.com> wrote:

> what kind of scenario
> would involve a stable 90% cache hit ratio for a table?

I'd bet accounts receivable applications often hit that.
(Most payments on recent billings; a sprinkling on older ones.)
I'm sure there are others.

-Kevin


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers