From: Greg Smith on
Tom Lane wrote:
> Seems like a more appropriate solution would be to make it easier to do
> that subtraction, ie, make it easier to capture the values at a given
> time point and then get deltas from there. It's more general (you could
> have multiple saved sets of values), and doesn't require superuser
> permissions to do, and doesn't have the same potential for
> damn-I-wish-I-hadn't-done-that moments.
>

You can make the same argument about the existing pg_stat_reset
mechanism. I would love to completely rework the stats infrastructure
so that it's easier to capture values with timestamps, compute diffs,
and do trending. However, I'm not sure the database itself is
necessarily the best place to do that at anyway. People who know what
they're doing are already handling this exact job using external tools
that grab regular snapshots for that purpose, so why try to duplicate
that work?

I'm trying to triage here, to scrub off the worst of the common
problems. I would never claim this is the perfect direction to follow
forever. There are a number of people who consider the inability to
reset the pg_stat_bgwriter stats in any way a bug that's gone unfixed
for two versions now. Your larger point that this style of
implementation is not ideal as a long-term way to manage statistics I
would completely agree with, I just don't have the time to spend on a
major rewrite to improve that. What I can offer is a fix for the most
common issue I get complaints about, in the form of a tool much more
likely to be used correctly by people who go looking for it than misused
IMHO.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Rafael Martinez on
Greg Smith wrote:
>
>> Before 8.3, we had the stats_reset_on_server_start parameter and
>> the pg_postmaster_start_time() function. This was an easy way of
>> resetting *all* statistics delivered by pg_stat_* and knowing when
>> this was done. We were able to produce stats with information about
>> sec/hours/days average values in an easy way.
>>
>
> With this new feature I'm submitting, you can adjust your database
> startup scripts to make this happen again. Start the server,
> immediately loop over every database and call pg_stat_reset on them
> all, and call pg_stat_reset_shared('bgwriter'). Now you've got
> completely cleared stats that are within a second or two of
> pg_postmaster_start_time(), should be close enough to most purposes.
> Theoretically we could automate that better, but I've found it hard
> to justify working on given that it's not that difficult to handle
> outside of the database once the individual pieces are exposed.
>


Great, this is good enough and we get what we need. Thanks :-)

regards
--
Rafael Martinez, <r.m.guerrero(a)usit.uio.no>
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on
Itagaki Takahiro wrote:
> To be honest, I have a plan to add performance statistics counters to
> postgres. It is not bgwriter's counters, but cluster-level. I'd like
> to use your infrastructure in my work, too :)
>

Attached patch provides just that. It still works basically the same as
my earlier version, except you pass it a name of what you want to reset,
and if you don't give it the only valid one right now ('bgwriter') it
rejects it (for now):

gsmith=# select checkpoints_req,buffers_alloc from pg_stat_bgwriter;
checkpoints_req | buffers_alloc
-----------------+---------------
4 | 129
(1 row)

gsmith=# select pg_stat_reset_shared('bgwriter');
pg_stat_reset_shared
----------------------

(1 row)

gsmith=# select checkpoints_req,buffers_alloc from pg_stat_bgwriter;
checkpoints_req | buffers_alloc
-----------------+---------------
0 | 7
(1 row)

gsmith=# select pg_stat_reset_shared('rubbish');
ERROR: Unrecognized reset target

I turn the input text into an enum choice as part of composing the
message to the stats collector. If you wanted to add some other shared
cluster-wide reset capabilities into there you could re-use most of this
infrastructure. Just add an extra enum value, map the text into that
enum, and write the actual handler that does the reset work. Should be
able to reuse the same new message type and external UI I implemented
for this specific clearing feature.

I didn't see any interaction to be concerned about here with Magnus's
suggestion he wanted to target stats reset on objects such as a single
table at some point.

The main coding choice I wasn't really sure about is how I flag the
error case where you pass bad in. I do that validation and throw
ERRCODE_SYNTAX_ERROR before composing the message to the stats
collector. Didn't know if I should create a whole new error code just
for this specific case or if reusing another error code was more
appropriate. Also, I didn't actually have the collector process itself
validate the data at all, it just quietly ignores bad messages on the
presumption everything is already being checked during message creation.
That seems consistent with the other code here--the other message
handlers only seem to throw errors when something really terrible
happens, not when they just don't find something useful to do.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.com

From: Robert Haas on
On Thu, Jan 14, 2010 at 1:11 PM, Tom Lane <tgl(a)sss.pgh.pa.us> wrote:
> Greg Smith <greg(a)2ndquadrant.com> writes:
>> Tom Lane wrote:
>>> Actually, that brings up a more general question: what's with the
>>> enthusiasm for clearing statistics *at all*?
>
>> ... Right now, you're still carrying around
>> the history of the bad period forever though, and every check of the
>> pg_stat_bgwriter requires manually subtracting the earlier values out.
>
> Seems like a more appropriate solution would be to make it easier to do
> that subtraction, ie, make it easier to capture the values at a given
> time point and then get deltas from there.  It's more general (you could
> have multiple saved sets of values), and doesn't require superuser
> permissions to do, and doesn't have the same potential for
> damn-I-wish-I-hadn't-done-that moments.

True, but it's also more complicated to use. Most systems I'm
familiar with[1] that have performance counters just provide an option
to clear them. Despite the disadvantages you cite, it seems to be
fairly useful in practice; anyway, I have found it so.

....Robert

[1] The other design I've seen is a system that automatically resets,
say, once a day. It retains the statistics for the 24-hour period
between the most recent two resets, and the statistics for the partial
period following the last reset. But that doesn't seem appropriate
for PostgreSQL....

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Magnus Hagander on
2010/1/14 Greg Smith <greg(a)2ndquadrant.com>:
> Itagaki Takahiro wrote:
>>
>> To be honest, I have a plan to add performance statistics counters to
>> postgres. It is not bgwriter's counters, but cluster-level. I'd like
>> to use your infrastructure in my work, too :)
>>
>
> Attached patch provides just that. It still works basically the same as my earlier version, except you pass it a name of what you want to reset, and if you don't give it the only valid one right now ('bgwriter') it rejects it (for now):
>
> gsmith=# select checkpoints_req,buffers_alloc from pg_stat_bgwriter;
> checkpoints_req | buffers_alloc
> -----------------+---------------
> 4 | 129
> (1 row)
>
> gsmith=# select pg_stat_reset_shared('bgwriter');
> pg_stat_reset_shared
> ----------------------
>
> (1 row)
>
> gsmith=# select checkpoints_req,buffers_alloc from pg_stat_bgwriter;
> checkpoints_req | buffers_alloc
> -----------------+---------------
> 0 | 7
> (1 row)
>
> gsmith=# select pg_stat_reset_shared('rubbish');
> ERROR: Unrecognized reset target
>
> I turn the input text into an enum choice as part of composing the message to the stats collector. If you wanted to add some other shared cluster-wide reset capabilities into there you could re-use most of this infrastructure. Just add an extra enum value, map the text into that enum, and write the actual handler that does the reset work. Should be able to reuse the same new message type and external UI I implemented for this specific clearing feature.

Yeah, that seems fine.

> I didn't see any interaction to be concerned about here with Magnus's suggestion he wanted to target stats reset on objects such as a single table at some point.

Agreed.


> The main coding choice I wasn't really sure about is how I flag the error case where you pass bad in. I do that validation and throw ERRCODE_SYNTAX_ERROR before composing the message to the stats collector. Didn't know if I should create a whole new error code just for this specific case or if reusing another error code was more appropriate. Also, I didn't actually have the collector process itself validate the data at all, it just quietly ignores bad messages on the presumption everything is already being checked during message creation. That seems consistent with the other code here--the other message handlers only seem to throw errors when something really terrible happens, not when they just don't find something useful to do.

Creating a whole new error code seems completely wrong.
ERRCODE_SYNTAX_ERROR seems fine to me. As does the not checking the
other options.

I looked the patch over, and I only really see one thing to comment on which is:
+ if (strcmp(target, "bgwriter") == 0)
+ msg.m_resettarget = RESET_BGWRITER;
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("Unrecognized reset target")));
+ }

Maybe this should be "Unrecognized reset target: %s", target, and also
a errhint() saying which targets are allowed. Thoughts?

As for the discussion about if this is useful or not, I definitely
think it is. Yes, there are certainly general improvements that could
be done to the collection mechanism to make some things easier, but
that doesn't mean we shouldn't make the current solution more useful
until we (potentially) have it.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers