From: Jeff Davis on
On Mon, 2009-10-19 at 21:22 -0500, Kevin Grittner wrote:
> I'd bet accounts receivable applications often hit that.
> (Most payments on recent billings; a sprinkling on older ones.)
> I'm sure there are others.

You worded the examples in terms of writes (I think), and we're talking
about read caching, so I still don't entirely understand.

Also, the example sounds like you'd like to optimize across queries.
There's no mechanism for the planner to remember some query executed a
while ago, and match it up to some new query that it's trying to plan.
Maybe there should be, but that's an entirely different feature.

I'm not clear on the scenario that we're trying to improve.

Regards,
Jeff Davis


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on
On Mon, 19 Oct 2009, Jeff Davis wrote:

> On Mon, 2009-10-19 at 21:22 -0500, Kevin Grittner wrote:
>> I'd bet accounts receivable applications often hit that.
>> (Most payments on recent billings; a sprinkling on older ones.)
>> I'm sure there are others.
>
> You worded the examples in terms of writes (I think), and we're talking
> about read caching, so I still don't entirely understand.

No, that part was fair. The unfortunate reality of accounts receivable is
that reports run to list people who owe one money happen much more often
than posting payments into the system does.

> Also, the example sounds like you'd like to optimize across queries.
> There's no mechanism for the planner to remember some query executed a
> while ago, and match it up to some new query that it's trying to plan.

Some of the use-cases here involve situations where you know most of a
relation is likely to be in cache just because there's not much going on
that might evict it. In any case, something that attempts to model some
average percentage you can expect a relation to be in cache is in effect
serving as a memory of past queries.

> I'm not clear on the scenario that we're trying to improve.

Duh, that would be the situation where someone wants optimizer hints but
can't call them that because then the idea would be reflexively rejected!

Looks like I should dust off the much more complicated proposal for
tracking and using in-cache hit percentages I keep not having time to
finish writing up. Allowing a user-set value for that is a lot more
reasonable if the system computes a reasonable one itself under normal
circumstances. That's what I think people really want, even if it's not
what they're asking for.

--
* Greg Smith gsmith(a)gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: marcin mank on
On Tue, Oct 20, 2009 at 1:21 AM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> If the parameter is defined as "the chance that a page is in cache"
>> there is very real physical meaning to it.
>
> We have no such parameter...

What a simple person like me would think would work is:

- call the parameter "cached_probability".
- Invent a way to store it (I`d actually try to do it the exact same
way recent "alter table set statistics distinct" does it)

a) less radical idea: replace all usage of random_page_cost with
seq_page_cost * cached_probability + random_page_cost *
(1-cached_probability)

b) more radical idea:
b1) invent a new GUC: cached_page_cost
b2) replace all usage of seq_page_cost with cached_page_cost *
cached_probability + seq_page_cost * (1-cached_probability)
b3) replace all usage of random_page_cost with cached_page_cost *
cached_probability + random_page_cost * (1-cached_probability)

> How would you come up with sensible figures for this hypothetical parameter?

select schemaname,relname,heap_blks_hit /
cast(heap_blks_read+heap_blks_hit+1 as float)
from pg_statio_all_tables

Would be a nice starting point.

Greetings
Marcin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on
Greg Smith <gsmith(a)gregsmith.com> wrote:

> The unfortunate reality of accounts receivable is that reports run
> to list people who owe one money happen much more often than posting
> payments into the system does.

How often do you have to print a list of past due accounts? I've
generally seen that done weekly or monthly, in the same places that
there are many people standing full time in payment windows just to
collect money from those lining up to pay. When they bring in a
document which identifies the receivable (like, in our case, a traffic
or parking ticket), there's no need to look at any older data in the
database.

Heck, even our case management applications likely follow the 90% to
95% cache hit pattern in counties which aren't fully cached, since
there's a lot more activity on court cases filed this year and last
year than on cases filed 30 years ago.

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on
On Tue, 20 Oct 2009, Kevin Grittner wrote:

> How often do you have to print a list of past due accounts? I've
> generally seen that done weekly or monthly, in the same places that
> there are many people standing full time in payment windows just to
> collect money from those lining up to pay.

This is only because your A/R collections staff includes people with guns.

--
* Greg Smith gsmith(a)gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers