From: Stefan Kaltenbrunner on
Kevin Grittner wrote:
> Greg Smith <greg(a)2ndquadrant.com> wrote:
>
>> In many of the more secure environments I've worked in (finance,
>> defense), there is *no* access to the database server beyond what
>> comes out of port 5432 without getting a whole separate team of
>> people involved. If the DBA can write a simple monitoring program
>> themselves that presents data via the one port that is exposed,
>> that makes life easier for them.
>
> Right, we don't want to give the monitoring software an OS login for
> the database servers, for security reasons.

depending on what you exactly mean by that I do have to wonder how you
monitor more complex stuff (or stuff that require elevated privs) - say
raid health, multipath configuration, status of OS level updates, "are
certain processes running or not" as well as basic parameters like CPU
or IO load. as in stuff you cannot know usless you have it exported
through "some" port.


Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: "Kevin Grittner" on
Stefan Kaltenbrunner <stefan(a)kaltenbrunner.cc> wrote:
> Kevin Grittner wrote:

>> Right, we don't want to give the monitoring software an OS login
>> for the database servers, for security reasons.
>
> depending on what you exactly mean by that I do have to wonder how
> you monitor more complex stuff (or stuff that require elevated
> privs) - say raid health, multipath configuration, status of OS
> level updates, "are certain processes running or not" as well as
> basic parameters like CPU or IO load. as in stuff you cannot know
> usless you have it exported through "some" port.

Many of those are monitored on the server one way or another,
through a hardware card accessible only to the DBAs. The card sends
an email to the DBAs for any sort of distress, including impending
or actual drive failure, ambient temperature out of bounds, internal
or external power out of bounds, etc. OS updates are managed by the
DBAs through scripts. Ideally we would tie these in to our opcenter
software, which displays status through hundreds of "LED" boxes on
big plasma displays in our support areas (and can send emails and
jabber messages when things get to a bad state), but since the
messages are getting to the right people in a timely manner, this is
a low priority as far as monitoring enhancement requests go.

Only the DBAs have OS logins to database servers. Monitoring
software must deal with application ports (which have to be open
anyway, so that doesn't add any security risk). Since the hardware
monitoring doesn't know about file systems, and the disk space on
database servers is primarily an issue for the database, it made
sense to us to add the ability to check the space available to the
database through a database connection. Hence, fsutil.

-Kevin

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Greg Smith on
Stefan Kaltenbrunner wrote:
>>
>> Another popular question is "how far behind real-time is the archiver
>> process?" You can do this right now by duplicating the same xlog
>> file name scanning and sorting that the archiver does in your own
>> code, looking for .ready files. It would be simpler if you could
>> call pg_last_archived_xlogfile() and then just grab that file's
>> timestamp.
>
> well that one seems a more reasonable reasoning to me however I'm not
> so sure that the proposed implementation feels right - though can't
> come up with a better suggestion for now.

That's basically where I'm at, and I was looking more for feedback on
that topic rather than to get lost defending use-cases here. There are
a few of them, and you can debate their individual merits all day. As a
general comment to your line of criticism here, I feel the idea that
"we're monitoring that already via <x>" does not mean that an additional
check is without value. The kind of people who like redundancy in their
database like it in their monitoring, too. I feel there's at least one
unique thing exposing this bit buys you, and the fact that it can be a
useful secondary source of information too for systems monitoring is
welcome bonus--regardless of whether good practice already supplies a
primary one.

> If you continue your line of thought you will have to add all kind of
> stuff to the database, like CPU usage tracking, getting information
> about running processes, storage health.

I'm looking to expose something that only the database knows for
sure--"what is the archiver working on?"--via the standard way you ask
the database questions, a SELECT call. The database doesn't know
anything about the CPU, running processes, or storage, so suggesting
this path leads in that direction doesn't make any sense.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(a)2ndQuadrant.com www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Stefan Kaltenbrunner on
Greg Smith wrote:
> Stefan Kaltenbrunner wrote:
>>>
>>> Another popular question is "how far behind real-time is the archiver
>>> process?" You can do this right now by duplicating the same xlog
>>> file name scanning and sorting that the archiver does in your own
>>> code, looking for .ready files. It would be simpler if you could
>>> call pg_last_archived_xlogfile() and then just grab that file's
>>> timestamp.
>>
>> well that one seems a more reasonable reasoning to me however I'm not
>> so sure that the proposed implementation feels right - though can't
>> come up with a better suggestion for now.
>
> That's basically where I'm at, and I was looking more for feedback on
> that topic rather than to get lost defending use-cases here. There are
> a few of them, and you can debate their individual merits all day. As a
> general comment to your line of criticism here, I feel the idea that
> "we're monitoring that already via <x>" does not mean that an additional
> check is without value. The kind of people who like redundancy in their
> database like it in their monitoring, too. I feel there's at least one
> unique thing exposing this bit buys you, and the fact that it can be a
> useful secondary source of information too for systems monitoring is
> welcome bonus--regardless of whether good practice already supplies a
> primary one.

well that might be true - but as somebody with an extensive sysadmin
background I was specifically ticked by the "disk full" stuff mentioned
upthread. Monitoring also means standardization and somebody who runs
hundreds (or dozends) of servers is much better of getting the basics
monitored the same on all systems and getting more specific as you move
upwards the (application)stack.


>
>> If you continue your line of thought you will have to add all kind of
>> stuff to the database, like CPU usage tracking, getting information
>> about running processes, storage health.
>
> I'm looking to expose something that only the database knows for
> sure--"what is the archiver working on?"--via the standard way you ask
> the database questions, a SELECT call. The database doesn't know
> anything about the CPU, running processes, or storage, so suggesting
> this path leads in that direction doesn't make any sense.

well the database does not really know much about "free diskspace" in
reality as well - the only thing it knows is that it might not be able
to write data or execute a script and unless you have shell/logfile
access you cannot diagnose those anyway even with all the proposed
functions.
However what I was really trying to say is that we should focus on
getting the code stable first and that prettying it up with fancy stat
functions is something that really can and should be done in a followup
release once we understand how the code behaves and maybe also how it is
likely going to evolve...


Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

From: Stefan Kaltenbrunner on
Kevin Grittner wrote:
> Stefan Kaltenbrunner <stefan(a)kaltenbrunner.cc> wrote:
>> Kevin Grittner wrote:
>
>>> Right, we don't want to give the monitoring software an OS login
>>> for the database servers, for security reasons.
>> depending on what you exactly mean by that I do have to wonder how
>> you monitor more complex stuff (or stuff that require elevated
>> privs) - say raid health, multipath configuration, status of OS
>> level updates, "are certain processes running or not" as well as
>> basic parameters like CPU or IO load. as in stuff you cannot know
>> usless you have it exported through "some" port.
>
> Many of those are monitored on the server one way or another,
> through a hardware card accessible only to the DBAs. The card sends
> an email to the DBAs for any sort of distress, including impending
> or actual drive failure, ambient temperature out of bounds, internal
> or external power out of bounds, etc. OS updates are managed by the
> DBAs through scripts. Ideally we would tie these in to our opcenter
> software, which displays status through hundreds of "LED" boxes on
> big plasma displays in our support areas (and can send emails and
> jabber messages when things get to a bad state), but since the
> messages are getting to the right people in a timely manner, this is
> a low priority as far as monitoring enhancement requests go.

well a lot of people (including myself) consider it a necessity to
aggregate all that stuff in your system monitoring, only that way you
can guarantee proper dependency handling (ie no need to page for
"webserver not running" if the whole server is down).
There is also a case to be made for statistics tracking and long term
monitoring of stuff.


>
> Only the DBAs have OS logins to database servers. Monitoring
> software must deal with application ports (which have to be open
> anyway, so that doesn't add any security risk). Since the hardware
> monitoring doesn't know about file systems, and the disk space on
> database servers is primarily an issue for the database, it made
> sense to us to add the ability to check the space available to the
> database through a database connection. Hence, fsutil.

still seems very backwards - there is much much more than can only be
monitored from within the OS(and not from an external
iLO/RSA/IMM/DRAC/whatever) that you cannot really do from within the
database (or any other application) so I'm still puzzled...


Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers