From: J G Miller on
On Monday, July 12th, 2010, at 15:13:55 -0500, Ignoramus20495 wrote:
>
> They have the same time.
>
> ssh -l root server-a-1 date; ssh -l root server-a-2 date Mon Jul 12
> 15:07:05 CDT 2010
> Mon Jul 12 15:07:05 CDT 2010

That is not the same as using ntpq to check for synchronization
with the time server which both though. ;)

What you should be checking is

ssh -l root server-a-1 ntpq -c peers
ssh -l root server-a-2 ntpq -c peers

> It kind of confirms my suspicions about drbd as the filesystem provider.

You need to consult a drbd expert then, or restart the drbd processes? ;)
From: Ignoramus23418 on
An update:

Restart of the NFS daemon did not help.

I restarted the secondary cluster server, then the primary (the
secondary took over). Now everything is running great. I guess 485
days is a bit too much for those servers to go at any one time.

i
From: Chris Ahlstrom on
Ignoramus23418 stopped playing his vuvuzela long enough to say:

> An update:
>
> Restart of the NFS daemon did not help.
>
> I restarted the secondary cluster server, then the primary (the
> secondary took over). Now everything is running great. I guess 485
> days is a bit too much for those servers to go at any one time.

Would have been nice to figure out what part was causing the issue,
though. But the bottom line is you're good to go for another 485 days.

--
When Dexter's on the Internet, can Hell be far behind?"
From: Stan Bischof on
In comp.os.linux.misc Chris Ahlstrom <ahlstromc(a)launchmodem.com> wrote:
> Ignoramus23418 stopped playing his vuvuzela long enough to say:
>
>> secondary took over). Now everything is running great. I guess 485
>> days is a bit too much for those servers to go at any one time.
>
> Would have been nice to figure out what part was causing the issue,
> though. But the bottom line is you're good to go for another 485 days.

or 485 hours, or 485 minutes, since the root cause isn't known.

At least it is very easy to fix- just restart.

If it happens again on your watch would suggest more
investigation.

Stan
From: Ignoramus15939 on
On 2010-07-13, Chris Ahlstrom <ahlstromc(a)launchmodem.com> wrote:
> Ignoramus23418 stopped playing his vuvuzela long enough to say:
>
>> An update:
>>
>> Restart of the NFS daemon did not help.
>>
>> I restarted the secondary cluster server, then the primary (the
>> secondary took over). Now everything is running great. I guess 485
>> days is a bit too much for those servers to go at any one time.
>
> Would have been nice to figure out what part was causing the issue,
> though. But the bottom line is you're good to go for another 485 days.
>

I blame DRBD myself. After 485 more days, these servers need to be retired.

i