Clearing global statistics [PgSql]

Prev: Fix for memory leak in dblink
Next: [HACKERS] Any ExecStoreTuple end runs?

From: Greg Smith on 14 Jan 2010 12:42

Tom Lane wrote:
> Actually, that brings up a more general question: what's with the
> enthusiasm for clearing statistics *at all*? ISTM that's something
> you should do only in dire emergencies, like the collector went
> haywire and has now got a bunch of garbage numbers. The notion of
> resetting subsets of the stats seems even more dubious, because now
> you have numbers that aren't mutually comparable. So I fail to
> understand why the desire to expend valuable development time on
> any of this.
>

When doing checkpoint tuning, the usual thing you start with is by
considering the ratio of time to segment-based checkpoints, along with
the corresponding balance of buffers written by the backends vs. the
checkpoint. When that shows poor behavior, typically because
checkpoint_segments is too low, you change its value and then resume
monitoring at the new setting. Right now, you're still carrying around
the history of the bad period forever though, and every check of the
pg_stat_bgwriter requires manually subtracting the earlier values out.
What people would like to do is reset those after adjusting
checkpoint_segments, and then you can eyeball the proportions directly
instead. That's exactly what the patch does. If I didn't see this
request in the field every month I wouldn't have spent a minute on a
patch to add it.

There was a suggestion that subsets of the data I'm clearing might be
useful to target, which I rejected on the bounds that it made it
possible to get an inconsistent set of results as you're concerned
about. You really need to clear everything that shows up in
pg_stat_bgwriter or not touch it at all. The main use case I'm trying
to support is the person who just made a config change and now wants to do:

select pg_stat_reset();
select pg_stat_reset_shared('bgwriter');

So that all of the stats they're now dealing with are from the same
post-tuning time period. Having numbers that are "mutually comparable"
across the whole system is exactly the reason why this new call is
needed, because there's this one part you just can't touch.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on 14 Jan 2010 12:44

Euler Taveira de Oliveira wrote:
> Greg Smith escreveu:
>
>> pg_stat_reset( which text )
>> which := 'buffers' | 'checkpoints' | 'tables' | 'functions' | ...
>>
>>
> What about adding 'all' too? Or the idea is resetting all global counters when
> we call pg_stat_reset() (without parameters)?
>

Once there's more than one piece to clear maybe adding in an 'all'
target makes sense. In the context of the update patch I've finished,
it just doesn't make sense given the code involved.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.com

From: Magnus Hagander on 14 Jan 2010 12:46

2010/1/14 Tom Lane <tgl(a)sss.pgh.pa.us>:
> Rafael Martinez <r.m.guerrero(a)usit.uio.no> writes:
>> Is there any chance of implementing a way of knowing when was the last
>> time statistics delivered via pg_stat_* were reset?
>
> Actually, that brings up a more general question: what's with the
> enthusiasm for clearing statistics *at all*? ISTM that's something
> you should do only in dire emergencies, like the collector went
> haywire and has now got a bunch of garbage numbers. The notion of
> resetting subsets of the stats seems even more dubious, because now
> you have numbers that aren't mutually comparable. So I fail to
> understand why the desire to expend valuable development time on
> any of this.

s/collector/application/ and you've got one reason.

Example, that I hit the other day. Examining pg_stat_user_functions
shows one function taking much longer than you'd expect. Called about
6 million times, total time about 7 days spent. Reason turned out to
be a missing index. Without clearing the stats, it'll take a *long*
time before the average goes down enough to make it possible to use
the simple SELECT self_time/calls FROM pg_stat_user_functions WHERE...
to monitor. Sure, if you have a system that graphs it, it'll update
properly, but for the quick manual checks, that view suddenly becomes
a lot less ueful.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on 14 Jan 2010 12:54

Rafael Martinez wrote:
> One thing I miss from the statistics you can get via pg_stat_* is
> information about how long we have been collecting stats (or in other
> words, when was the last time the stats were reset)
>

I've considered adding this for the same reasons you're asking about it,
but am not happy with the trade-offs involved. The problem is that you
have to presume the server was running the entirety of the time since
stats were reset for that data to be useful. So unless people are in
that situation, they're going to get data that may not represent what
they think it does. Realistically, if you want a timestamp that always
means something useful you have to rest the stats at every server start,
which leads us to:

> Before 8.3, we had the stats_reset_on_server_start parameter and the
> pg_postmaster_start_time() function. This was an easy way of resetting
> *all* statistics delivered by pg_stat_* and knowing when this was done.
> We were able to produce stats with information about sec/hours/days
> average values in an easy way.
>

With this new feature I'm submitting, you can adjust your database
startup scripts to make this happen again. Start the server,
immediately loop over every database and call pg_stat_reset on them all,
and call pg_stat_reset_shared('bgwriter'). Now you've got completely
cleared stats that are within a second or two of
pg_postmaster_start_time(), should be close enough to most purposes.
Theoretically we could automate that better, but I've found it hard to
justify working on given that it's not that difficult to handle outside
of the database once the individual pieces are exposed.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Tom Lane on 14 Jan 2010 13:11

Greg Smith <greg(a)2ndquadrant.com> writes:
> Tom Lane wrote:
>> Actually, that brings up a more general question: what's with the
>> enthusiasm for clearing statistics *at all*?

> ... Right now, you're still carrying around
> the history of the bad period forever though, and every check of the
> pg_stat_bgwriter requires manually subtracting the earlier values out.

Seems like a more appropriate solution would be to make it easier to do
that subtraction, ie, make it easier to capture the values at a given
time point and then get deltas from there. It's more general (you could
have multiple saved sets of values), and doesn't require superuser
permissions to do, and doesn't have the same potential for
damn-I-wish-I-hadn't-done-that moments.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: Fix for memory leak in dblink
Next: [HACKERS] Any ExecStoreTuple end runs?