Prev: ssh logging
Next: ip filter solaris...
From: Kal-El on 11 May 2006 11:46 We are running a Solaris 2.8 NIS server on an Ultra 10. We have several clients that talk to it. At various times throughout the day, we'll notice the _server_ (lets call it sun1 here and our nis domain sun.yp) reporting, "NIS server not responding for domain "sun.yp". Maybe 2 or three times a day, we'll have a 15 or so minute window when we'll get these errors... again, _from the NIS server_ on the NIS server. So, when that happens, and if that happens for a long enough period, the clients start reporting that as well. But, it usually starts with the NIS server first reporting that it can't see itself. Usually, the error will occur, and then, sometimes within the same second, it starts to work again, like: May 10 12:22:48 sun1 ypbind[12604]: [ID 337329 daemon.error] NIS server not responding for domain "sun.yp"; still trying May 10 12:22:58 sun1 ypbind[12659]: [ID 337329 daemon.error] NIS server not responding for domain "sun.yp"; still trying May 10 12:23:02sun1 ypbind[12727]: [ID 647655 daemon.error] NIS server for domain "sun.yp" OK May 10 12:40:42sun1 ypbind[19420]: [ID 337329 daemon.error] NIS server not responding for domain "sun.yp"; still trying May 10 12:40:42 sun1 ypbind[19475]: [ID 647655 daemon.error] NIS server for domain "sun.yp" OK So, for example, this happened today at 4:10am, 9:20am, 10:20am, and (as seen above) 12:22am, 12:23am, and 12:40am. The problem usually goes away for a few hours then comes back and fixes itself. However, as I mentioned before, if the NIS server loses connection to its _own_ NIS service for a few seconds or more, the clients can start to notice too, and then hang. Anyone know what's going on here? How can an NIS server lose connection to itself? Note that the NIS server also run sendmail, bind, spamassassin, and is an imap and pop server. This server has been in place for years, but we haven't really noticed these problems until a few months ago. I've tried to find something in the logs that is consistantly there at the time these problems occur, but haven't found a pattern. For example, today, I see a lot of sendmail errors that happen to be occuring at the same time: sendmail[540]: [ID 801593 mail.crit] NOQUEUE: SYSERR(root): getrequests: accept: Software caused connection abort But, it's not always the case that the NIS server loses connection to itself at the same times we get errors like the sendmail one above. Any ideas? Any bugs? Needed patches? Thanks! Kal
From: Chris Cox on 11 May 2006 12:02 Kal-El wrote: > We are running a Solaris 2.8 NIS server on an Ultra 10. We have several > clients that talk to it. At various times throughout the day, we'll > notice the _server_ (lets call it sun1 here and our nis domain sun.yp) > reporting, "NIS server not responding for domain "sun.yp". > > Maybe 2 or three times a day, we'll have a 15 or so minute window when > we'll get these errors... again, _from the NIS server_ on the NIS > server. So, when that happens, and if that happens for a long enough > period, the clients start reporting that as well. But, it usually > starts with the NIS server first reporting that it can't see itself. > > Usually, the error will occur, and then, sometimes within the same > second, it starts to work again, like: > > May 10 12:22:48 sun1 ypbind[12604]: [ID 337329 daemon.error] NIS server > not responding for domain "sun.yp"; still trying > May 10 12:22:58 sun1 ypbind[12659]: [ID 337329 daemon.error] NIS server > not responding for domain "sun.yp"; still trying > May 10 12:23:02sun1 ypbind[12727]: [ID 647655 daemon.error] NIS server > for domain "sun.yp" OK > May 10 12:40:42sun1 ypbind[19420]: [ID 337329 daemon.error] NIS server > not responding for domain "sun.yp"; still trying > May 10 12:40:42 sun1 ypbind[19475]: [ID 647655 daemon.error] NIS server > for domain "sun.yp" OK > .... > > sendmail[540]: [ID 801593 mail.crit] NOQUEUE: SYSERR(root): > getrequests: accept: Software caused connection abort > .... > > Any ideas? Any bugs? Needed patches? Network problem? Is the NIS accessing itself via localhost?? How is it reaching itself. Could still be network related. Could be something else (hardware, etc.. as you have said). Anything weird running via cron? .... a mystery...
From: Kal-El on 11 May 2006 12:38 Nothing weird in cron... Cron has been the same for years, and this is something that started kicking in when we started loading more onto the machine (like spamassassin, and imapping of large inboxes --100megs or more). I'm really thinking there's just too much running on the system. Not a network problem as far as I can see.... even moved the server it one of its clients to a single switch. Not sure what you're asking when you say, "Is the NIS accessing itself via localhost??". In this specific case, the server and the client are the same. The client accesses the NIS service on the machine by the hostname of the computer. Again, things work great most of the time, but then we have these periods where we get these "cannot connect" errors to a service running on the same machine as the client. Steve
From: Chris Cox on 11 May 2006 20:01 Kal-El wrote: > Nothing weird in cron... Cron has been the same for years, and this is > something that started kicking in when we started loading more onto the > machine (like spamassassin, and imapping of large inboxes --100megs or > more). I'm really thinking there's just too much running on the > system. > > Not a network problem as far as I can see.... even moved the server it > one of its clients to a single switch. > > Not sure what you're asking when you say, "Is the NIS accessing itself > via localhost??". In this specific case, the server and the client > are the same. The client accesses the NIS service on the machine by > the hostname of the computer. Does a ypwhich yield localhost or the machine by name? It "might" make a difference. > > Again, things work great most of the time, but then we have these > periods where we get these "cannot connect" errors to a service running > on the same machine as the client. Still a mystery to me...
From: gorman on 12 May 2006 10:12 Hi, Two questions: Do you have any NIS slaves which is unreachable ? At which line in /etc/host is the NIS-master's entry ? We had a similar problem a while ago which was a combination of these two.
|
Pages: 1 Prev: ssh logging Next: ip filter solaris... |