From: Keith Keller on 9 Jul 2010 23:39 On 2010-07-09, mjt <myswtestYOURSHOES(a)gmail.com> wrote: > > Some may say this suggestion is over the top, but I'd write > a simple "c program" to launch multiple threads, each one > responsible for each "machine ping", then offer different > methods of communicating the results. Ask me how I know :) > > I did this years ago for the same reason ... I had it send > me a text message for any machine(s) not responding. It's not over the top, it's been done. I use nagios, but there are many other programs that do similar tasks. I don't know what sort of IPMI support nagios has, but a plugin can be written in any language fairly easily if you know what the various cases are (i.e., when things are OK, when you want to send a warning message, when you want to send a critical alert). --keith -- kkeller-usenet(a)wombat.san-francisco.ca.us (try just my userid to email me) AOLSFAQ=http://www.therockgarden.ca/aolsfaq.txt see X- headers for PGP signature information
From: Rahul on 10 Jul 2010 11:31 marrgol <marspamrgol(a)gspammail.com> wrote in news:4c37c584$0$17097$65785112 @news.neostrada.pl: > Then send one ping with the deadline, and if it succeeds continue > with ipmitool: > > Ah! why didn't I think of this! :) Thanks! That works well. -- Rahul
From: Rahul on 10 Jul 2010 11:35 Keith Keller <kkeller-usenet(a)wombat.san-francisco.ca.us> wrote in news:2k6lg7xcct.ln2(a)goaway.wombat.san-francisco.ca.us: > It's not over the top, it's been done. I use nagios, but there are many > other programs that do similar tasks. I don't know what sort of IPMI > support nagios has, but a plugin can be written in any language fairly > easily if you know what the various cases are (i.e., when things are OK, > when you want to send a warning message, when you want to send a > critical alert). > Thanks for those options.I'm already running nagios so I could definately do this from within. I was just wonderin if there was someting simple like: $$timeout 10 foocommand Where foocommand would run but if it exceeded 10 seconds then it would be killed. Maybe there is a way to write a C wrapper that does this? i'll give it a shot but wanted to check if there already exists any utility like thhis one before reinventing the wheel. e.g. for issuing commands on multiple nodes I often use pssh. This does come with builtin timeouts. pssh -h hostfile -t timeout foocommand An idea is to maybe invoke pssh on the current node itself and see if it works. -- Rahul
From: Kenny McCormack on 10 Jul 2010 13:42 In article <Xns9DB16BC8F9E706650A1FC0D7811DDBC81(a)81.169.183.62>, Rahul <nospam(a)invalid.invalid> wrote: >Keith Keller <kkeller-usenet(a)wombat.san-francisco.ca.us> wrote in >news:2k6lg7xcct.ln2(a)goaway.wombat.san-francisco.ca.us: > >> It's not over the top, it's been done. I use nagios, but there are many >> other programs that do similar tasks. I don't know what sort of IPMI >> support nagios has, but a plugin can be written in any language fairly >> easily if you know what the various cases are (i.e., when things are OK, >> when you want to send a warning message, when you want to send a >> critical alert). >> > >Thanks for those options.I'm already running nagios so I could definately >do this from within. I was just wonderin if there was someting simple like: > >$$timeout 10 foocommand I've been using this for decades. Works like a charm! --- Cut Here --- #!/usr/bin/expect -- # Usage: maxtime <maxsecs> <cmd> [<arg(s)>] set echo "" if {[lindex $argv 0] == "-q"} { set echo "-noecho" set argv [lrange $argv 1 end] } set timeout [lindex $argv 0] if [catch {eval spawn $echo [lrange $argv 1 end]}] { puts "Usage: maxtime \[-q] <timeINsecs> <cmd> \[<arg(s)>]" exit -1 } if [catch {set exitval $env(MAXTIME_exitval)}] { set exitval -1 } expect timeout { puts stderr "[lrange $argv 1 1]: Timeout" ; exit $exitval } exit [lrange [wait] 3 3] --- Cut Here --- Now, note that there is one problem with this (not relevant to your use case, but it *has* been a problem for me). That is, if the program you are controlling goes into a device wait (the really bad case - the case where the program can't be killed with the ubiquitous "kill -9"), then the maxtime program will stall as well. This turns out to be a side effect of adding support for catching the exit status of the controlled command - i.e., of using [wait]. If you remove the [wait], then it will work even in the "bad case". What I'd really like to do is to just [wait] a little time - say a few seconds - then exit. I think it is easy enough to do - I just haven't gotten around to it. -- > No, I haven't, that's why I'm asking questions. If you won't help me, > why don't you just go find your lost manhood elsewhere. CLC in a nutshell.
From: Rahul on 10 Jul 2010 17:10 gazelle(a)shell.xmission.com (Kenny McCormack) wrote in news:i1abf2$mbd$3 @news.xmission.com: > Now, note that there is one problem with this (not relevant to your use > case, but it *has* been a problem for me). That is, if the program you > are controlling goes into a device wait (the really bad case - the case > where the program can't be killed with the ubiquitous "kill -9"), then > the maxtime program will stall as well. This turns out to be a side > effect of adding support for catching the exit status of the controlled > command - i.e., of using [wait]. If you remove the [wait], then it will > work even in the "bad case". What I'd really like to do is to just > [wait] a little time - say a few seconds - then exit. I think it is > easy enough to do - I just haven't gotten around to it. > Thanks very much! This works like a charm. That's exactly what I wanted to achieve. -- Rahul
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 Prev: What is the best recent kernel Next: NFS writes became extremely slow overnight |