From: Keith Keller on
On 2010-07-09, mjt <myswtestYOURSHOES(a)gmail.com> wrote:
>
> Some may say this suggestion is over the top, but I'd write
> a simple "c program" to launch multiple threads, each one
> responsible for each "machine ping", then offer different
> methods of communicating the results. Ask me how I know :)
>
> I did this years ago for the same reason ... I had it send
> me a text message for any machine(s) not responding.

It's not over the top, it's been done. I use nagios, but there are many
other programs that do similar tasks. I don't know what sort of IPMI
support nagios has, but a plugin can be written in any language fairly
easily if you know what the various cases are (i.e., when things are OK,
when you want to send a warning message, when you want to send a
critical alert).

--keith

--
kkeller-usenet(a)wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://www.therockgarden.ca/aolsfaq.txt
see X- headers for PGP signature information

From: Rahul on
marrgol <marspamrgol(a)gspammail.com> wrote in news:4c37c584$0$17097$65785112
@news.neostrada.pl:

> Then send one ping with the deadline, and if it succeeds continue
> with ipmitool:
>
>

Ah! why didn't I think of this! :) Thanks! That works well.

--
Rahul
From: Rahul on
Keith Keller <kkeller-usenet(a)wombat.san-francisco.ca.us> wrote in
news:2k6lg7xcct.ln2(a)goaway.wombat.san-francisco.ca.us:

> It's not over the top, it's been done. I use nagios, but there are many
> other programs that do similar tasks. I don't know what sort of IPMI
> support nagios has, but a plugin can be written in any language fairly
> easily if you know what the various cases are (i.e., when things are OK,
> when you want to send a warning message, when you want to send a
> critical alert).
>

Thanks for those options.I'm already running nagios so I could definately
do this from within. I was just wonderin if there was someting simple like:

$$timeout 10 foocommand

Where foocommand would run but if it exceeded 10 seconds then it would be
killed. Maybe there is a way to write a C wrapper that does this? i'll
give it a shot but wanted to check if there already exists any utility like
thhis one before reinventing the wheel.

e.g. for issuing commands on multiple nodes I often use pssh. This does
come with builtin timeouts.

pssh -h hostfile -t timeout foocommand

An idea is to maybe invoke pssh on the current node itself and see if it
works.

--
Rahul
From: Kenny McCormack on
In article <Xns9DB16BC8F9E706650A1FC0D7811DDBC81(a)81.169.183.62>,
Rahul <nospam(a)invalid.invalid> wrote:
>Keith Keller <kkeller-usenet(a)wombat.san-francisco.ca.us> wrote in
>news:2k6lg7xcct.ln2(a)goaway.wombat.san-francisco.ca.us:
>
>> It's not over the top, it's been done. I use nagios, but there are many
>> other programs that do similar tasks. I don't know what sort of IPMI
>> support nagios has, but a plugin can be written in any language fairly
>> easily if you know what the various cases are (i.e., when things are OK,
>> when you want to send a warning message, when you want to send a
>> critical alert).
>>
>
>Thanks for those options.I'm already running nagios so I could definately
>do this from within. I was just wonderin if there was someting simple like:
>
>$$timeout 10 foocommand

I've been using this for decades. Works like a charm!

--- Cut Here ---
#!/usr/bin/expect --
# Usage: maxtime <maxsecs> <cmd> [<arg(s)>]
set echo ""
if {[lindex $argv 0] == "-q"} {
set echo "-noecho"
set argv [lrange $argv 1 end]
}
set timeout [lindex $argv 0]
if [catch {eval spawn $echo [lrange $argv 1 end]}] {
puts "Usage: maxtime \[-q] <timeINsecs> <cmd> \[<arg(s)>]"
exit -1
}
if [catch {set exitval $env(MAXTIME_exitval)}] { set exitval -1 }
expect timeout { puts stderr "[lrange $argv 1 1]: Timeout" ; exit $exitval }
exit [lrange [wait] 3 3]
--- Cut Here ---

Now, note that there is one problem with this (not relevant to your use
case, but it *has* been a problem for me). That is, if the program you
are controlling goes into a device wait (the really bad case - the case
where the program can't be killed with the ubiquitous "kill -9"), then
the maxtime program will stall as well. This turns out to be a side
effect of adding support for catching the exit status of the controlled
command - i.e., of using [wait]. If you remove the [wait], then it will
work even in the "bad case". What I'd really like to do is to just
[wait] a little time - say a few seconds - then exit. I think it is
easy enough to do - I just haven't gotten around to it.

--
> No, I haven't, that's why I'm asking questions. If you won't help me,
> why don't you just go find your lost manhood elsewhere.

CLC in a nutshell.

From: Rahul on
gazelle(a)shell.xmission.com (Kenny McCormack) wrote in news:i1abf2$mbd$3
@news.xmission.com:

> Now, note that there is one problem with this (not relevant to your use
> case, but it *has* been a problem for me). That is, if the program you
> are controlling goes into a device wait (the really bad case - the case
> where the program can't be killed with the ubiquitous "kill -9"), then
> the maxtime program will stall as well. This turns out to be a side
> effect of adding support for catching the exit status of the controlled
> command - i.e., of using [wait]. If you remove the [wait], then it will
> work even in the "bad case". What I'd really like to do is to just
> [wait] a little time - say a few seconds - then exit. I think it is
> easy enough to do - I just haven't gotten around to it.
>

Thanks very much! This works like a charm. That's exactly what I wanted to
achieve.

--
Rahul