per table random-page-cost? [PgSql]

Prev: Application name patch - v2
Next: Controlling changes in plpgsql variableresolution

From: =?iso-8859-1?q?C=E9dric_Villemain?= on 22 Oct 2009 11:01

Le mardi 20 octobre 2009 06:30:26, Greg Smith a écrit :
> On Mon, 19 Oct 2009, Jeff Davis wrote:
> > On Mon, 2009-10-19 at 21:22 -0500, Kevin Grittner wrote:
> >> I'd bet accounts receivable applications often hit that.
> >> (Most payments on recent billings; a sprinkling on older ones.)
> >> I'm sure there are others.
> >
> > You worded the examples in terms of writes (I think), and we're talking
> > about read caching, so I still don't entirely understand.
>
> No, that part was fair. The unfortunate reality of accounts receivable is
> that reports run to list people who owe one money happen much more often
> than posting payments into the system does.
>
> > Also, the example sounds like you'd like to optimize across queries.
> > There's no mechanism for the planner to remember some query executed a
> > while ago, and match it up to some new query that it's trying to plan.
>
> Some of the use-cases here involve situations where you know most of a
> relation is likely to be in cache just because there's not much going on
> that might evict it. In any case, something that attempts to model some
> average percentage you can expect a relation to be in cache is in effect
> serving as a memory of past queries.
>
> > I'm not clear on the scenario that we're trying to improve.
>
> Duh, that would be the situation where someone wants optimizer hints but
> can't call them that because then the idea would be reflexively rejected!
>
> Looks like I should dust off the much more complicated proposal for
> tracking and using in-cache hit percentages I keep not having time to
> finish writing up. Allowing a user-set value for that is a lot more
> reasonable if the system computes a reasonable one itself under normal
> circumstances. That's what I think people really want, even if it's not
> what they're asking for.

Have you already some work in a git or somewhere ?

>
> --
> * Greg Smith gsmith(a)gregsmith.com http://www.gregsmith.com Baltimore, MD
>

--
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org

From: =?iso-8859-1?q?C=E9dric_Villemain?= on 22 Oct 2009 11:03

Le lundi 19 octobre 2009 23:14:40, Robert Haas a écrit :
> On Mon, Oct 19, 2009 at 5:08 PM, marcin mank <marcin.mank(a)gmail.com> wrote:
> > Currently random_page_cost is a GUC. I propose that this could be set
> > per-table.
> >
> > I think this is a good idea for widely-wanted planner hints. This way
> > You can say "I do NOT want this table to be index-scanned, because I
> > know it is not cached" by setting it`s random_page_cost to a large
> > value (an obviously You can do the other way around, when setting the
> > random_page_cost to 1 You say "I don`t care how You fetch the pages,
> > they are all in cache")
> >
> > The value for the per-table setting could be inferred from
> > pg_stat(io)?.*tables . We could have a tool to suggest appropriate
> > values.
> >
> > We could call it something like cached_percentage (and have the cost
> > of a random tuple fetch be inferred from the global random_page_cost,
> > seq_tuple_cost and the per-table cached_percentage). Then we could set
> > the global random_page_cost to a sane value like 200. Now one can
> > wonder why the planner works while having such blantantly unrealistic
> > values for random_page_cost :)
> >
> > What do You think?
>
> I've been thinking about this a bit, too. I've been wondering if it
> might make sense to have a "random_page_cost" and "seq_page_cost"
> setting for each TABLESPACE, to compensate for the fact that different
> media might be faster or slower, and a percent-cached setting for each
> table over top of that.

At least settings by TABLESPACE should exists. I totaly agree with that.

>
> ...Robert
>

--
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org

From: =?iso-8859-1?q?C=E9dric_Villemain?= on 22 Oct 2009 11:16

Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit :
> On Mon, Oct 19, 2009 at 2:08 PM, marcin mank <marcin.mank(a)gmail.com> wrote:
> > Currently random_page_cost is a GUC. I propose that this could be set
> > per-table.
>
> Or per-tablespace.
>
> Yes, I think there are a class of GUCs which describe the physical
> attributes of the storage system which should be per-table or
> per-tablespace. random_page_cost, sequential_page_cost,
> effective_io_concurrency come to mind.

and, perhaps effective_cache_size.

You can have situation where you don't want some tables go to OS memory (you
can disabled that at filesystem level, ... l'd like to be able to do that at
postgres level but it is another point)

So you put those tables in a separate tablespace, and tell postgresql that the
effective_cache_size is 0 (for this tablespace), up to postgres to do the right
thing with that ;)

>
> While this isn't a simple flag to change it does seem like a bit of a
> SMOP. The GUC infrastructure stores these values in global variables
> which the planner and other systems consult directly. They would
> instead have to be made storage parameters which the planner and other
> systems check on the appropriate table and default to the global GUC
> if they're not set.
>

--
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org

From: Robert Haas on 22 Oct 2009 11:33

On Thu, Oct 22, 2009 at 11:16 AM, Cédric Villemain
<cedric.villemain(a)dalibo.com> wrote:
> Le lundi 19 octobre 2009 23:27:20, Greg Stark a écrit :
>> On Mon, Oct 19, 2009 at 2:08 PM, marcin mank <marcin.mank(a)gmail.com> wrote:
>> > Currently random_page_cost is a GUC. I propose that this could be set
>> > per-table.
>>
>> Or per-tablespace.
>>
>> Yes, I think there are a class of GUCs which describe the physical
>> attributes of the storage system which should be per-table or
>> per-tablespace. random_page_cost, sequential_page_cost,
>> effective_io_concurrency come to mind.
>
> and, perhaps effective_cache_size.
>
> You can have situation where you don't want some tables go to OS memory (you
> can disabled that at filesystem level, ... l'd like to be able to do that at
> postgres level but it is another point)
>
> So you put those tables in a separate tablespace, and tell postgresql that the
> effective_cache_size is 0 (for this tablespace), up to postgres to do the right
> thing with that ;)

Why would you ever want to set effective_cache_size to 0?

....Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Stark on 22 Oct 2009 14:01

On Thu, Oct 22, 2009 at 8:16 AM, Cédric Villemain
<cedric.villemain(a)dalibo.com> wrote:
> You can have situation where you don't want some tables go to OS memory

I don't think this is a configuration we want to cater for. The
sysadmin shouldn't be required to understand the i/o pattern of
postgres. He or she cannot know whether the database will want to
access the same blocks twice for internal algorithms that isn't
visible from the user point of view.

The scenarios where you might want to do this would be if you know
there are tables which are accessed very randomly with no locality and
very low cache hit rates. I think the direction we want to head is
towards making sure the cache manager is automatically resistant to
such data.

There is another use case which perhaps needs to be addressed: if the
user has some queries which are very latency sensitive and others
which are not latency sensitive. In that case it might be very
important to keep the pages of data used by the high priority queries
in the cache. That's something we should have a high level abstract
interface for, not depend on low level system features.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8
Prev: Application name patch - v2
Next: Controlling changes in plpgsql variableresolution