From: Stefan Kaltenbrunner on 15 Jan 2010 12:30 Kevin Grittner wrote: > Greg Smith <greg(a)2ndquadrant.com> wrote: > >> In many of the more secure environments I've worked in (finance, >> defense), there is *no* access to the database server beyond what >> comes out of port 5432 without getting a whole separate team of >> people involved. If the DBA can write a simple monitoring program >> themselves that presents data via the one port that is exposed, >> that makes life easier for them. > > Right, we don't want to give the monitoring software an OS login for > the database servers, for security reasons. depending on what you exactly mean by that I do have to wonder how you monitor more complex stuff (or stuff that require elevated privs) - say raid health, multipath configuration, status of OS level updates, "are certain processes running or not" as well as basic parameters like CPU or IO load. as in stuff you cannot know usless you have it exported through "some" port. Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: "Kevin Grittner" on 15 Jan 2010 13:03 Stefan Kaltenbrunner <stefan(a)kaltenbrunner.cc> wrote: > Kevin Grittner wrote: >> Right, we don't want to give the monitoring software an OS login >> for the database servers, for security reasons. > > depending on what you exactly mean by that I do have to wonder how > you monitor more complex stuff (or stuff that require elevated > privs) - say raid health, multipath configuration, status of OS > level updates, "are certain processes running or not" as well as > basic parameters like CPU or IO load. as in stuff you cannot know > usless you have it exported through "some" port. Many of those are monitored on the server one way or another, through a hardware card accessible only to the DBAs. The card sends an email to the DBAs for any sort of distress, including impending or actual drive failure, ambient temperature out of bounds, internal or external power out of bounds, etc. OS updates are managed by the DBAs through scripts. Ideally we would tie these in to our opcenter software, which displays status through hundreds of "LED" boxes on big plasma displays in our support areas (and can send emails and jabber messages when things get to a bad state), but since the messages are getting to the right people in a timely manner, this is a low priority as far as monitoring enhancement requests go. Only the DBAs have OS logins to database servers. Monitoring software must deal with application ports (which have to be open anyway, so that doesn't add any security risk). Since the hardware monitoring doesn't know about file systems, and the disk space on database servers is primarily an issue for the database, it made sense to us to add the ability to check the space available to the database through a database connection. Hence, fsutil. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Greg Smith on 15 Jan 2010 13:44 Stefan Kaltenbrunner wrote: >> >> Another popular question is "how far behind real-time is the archiver >> process?" You can do this right now by duplicating the same xlog >> file name scanning and sorting that the archiver does in your own >> code, looking for .ready files. It would be simpler if you could >> call pg_last_archived_xlogfile() and then just grab that file's >> timestamp. > > well that one seems a more reasonable reasoning to me however I'm not > so sure that the proposed implementation feels right - though can't > come up with a better suggestion for now. That's basically where I'm at, and I was looking more for feedback on that topic rather than to get lost defending use-cases here. There are a few of them, and you can debate their individual merits all day. As a general comment to your line of criticism here, I feel the idea that "we're monitoring that already via <x>" does not mean that an additional check is without value. The kind of people who like redundancy in their database like it in their monitoring, too. I feel there's at least one unique thing exposing this bit buys you, and the fact that it can be a useful secondary source of information too for systems monitoring is welcome bonus--regardless of whether good practice already supplies a primary one. > If you continue your line of thought you will have to add all kind of > stuff to the database, like CPU usage tracking, getting information > about running processes, storage health. I'm looking to expose something that only the database knows for sure--"what is the archiver working on?"--via the standard way you ask the database questions, a SELECT call. The database doesn't know anything about the CPU, running processes, or storage, so suggesting this path leads in that direction doesn't make any sense. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg(a)2ndQuadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Stefan Kaltenbrunner on 16 Jan 2010 12:18 Greg Smith wrote: > Stefan Kaltenbrunner wrote: >>> >>> Another popular question is "how far behind real-time is the archiver >>> process?" You can do this right now by duplicating the same xlog >>> file name scanning and sorting that the archiver does in your own >>> code, looking for .ready files. It would be simpler if you could >>> call pg_last_archived_xlogfile() and then just grab that file's >>> timestamp. >> >> well that one seems a more reasonable reasoning to me however I'm not >> so sure that the proposed implementation feels right - though can't >> come up with a better suggestion for now. > > That's basically where I'm at, and I was looking more for feedback on > that topic rather than to get lost defending use-cases here. There are > a few of them, and you can debate their individual merits all day. As a > general comment to your line of criticism here, I feel the idea that > "we're monitoring that already via <x>" does not mean that an additional > check is without value. The kind of people who like redundancy in their > database like it in their monitoring, too. I feel there's at least one > unique thing exposing this bit buys you, and the fact that it can be a > useful secondary source of information too for systems monitoring is > welcome bonus--regardless of whether good practice already supplies a > primary one. well that might be true - but as somebody with an extensive sysadmin background I was specifically ticked by the "disk full" stuff mentioned upthread. Monitoring also means standardization and somebody who runs hundreds (or dozends) of servers is much better of getting the basics monitored the same on all systems and getting more specific as you move upwards the (application)stack. > >> If you continue your line of thought you will have to add all kind of >> stuff to the database, like CPU usage tracking, getting information >> about running processes, storage health. > > I'm looking to expose something that only the database knows for > sure--"what is the archiver working on?"--via the standard way you ask > the database questions, a SELECT call. The database doesn't know > anything about the CPU, running processes, or storage, so suggesting > this path leads in that direction doesn't make any sense. well the database does not really know much about "free diskspace" in reality as well - the only thing it knows is that it might not be able to write data or execute a script and unless you have shell/logfile access you cannot diagnose those anyway even with all the proposed functions. However what I was really trying to say is that we should focus on getting the code stable first and that prettying it up with fancy stat functions is something that really can and should be done in a followup release once we understand how the code behaves and maybe also how it is likely going to evolve... Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: Stefan Kaltenbrunner on 16 Jan 2010 12:22
Kevin Grittner wrote: > Stefan Kaltenbrunner <stefan(a)kaltenbrunner.cc> wrote: >> Kevin Grittner wrote: > >>> Right, we don't want to give the monitoring software an OS login >>> for the database servers, for security reasons. >> depending on what you exactly mean by that I do have to wonder how >> you monitor more complex stuff (or stuff that require elevated >> privs) - say raid health, multipath configuration, status of OS >> level updates, "are certain processes running or not" as well as >> basic parameters like CPU or IO load. as in stuff you cannot know >> usless you have it exported through "some" port. > > Many of those are monitored on the server one way or another, > through a hardware card accessible only to the DBAs. The card sends > an email to the DBAs for any sort of distress, including impending > or actual drive failure, ambient temperature out of bounds, internal > or external power out of bounds, etc. OS updates are managed by the > DBAs through scripts. Ideally we would tie these in to our opcenter > software, which displays status through hundreds of "LED" boxes on > big plasma displays in our support areas (and can send emails and > jabber messages when things get to a bad state), but since the > messages are getting to the right people in a timely manner, this is > a low priority as far as monitoring enhancement requests go. well a lot of people (including myself) consider it a necessity to aggregate all that stuff in your system monitoring, only that way you can guarantee proper dependency handling (ie no need to page for "webserver not running" if the whole server is down). There is also a case to be made for statistics tracking and long term monitoring of stuff. > > Only the DBAs have OS logins to database servers. Monitoring > software must deal with application ports (which have to be open > anyway, so that doesn't add any security risk). Since the hardware > monitoring doesn't know about file systems, and the disk space on > database servers is primarily an issue for the database, it made > sense to us to add the ability to check the space available to the > database through a database connection. Hence, fsutil. still seems very backwards - there is much much more than can only be monitored from within the OS(and not from an external iLO/RSA/IMM/DRAC/whatever) that you cannot really do from within the database (or any other application) so I'm still puzzled... Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers(a)postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |