From: Nataraj on 3 May 2010 20:32 Hi, I would appreciate any suggestions anyone can offer on the following problem that I'm having with postfix... I'm running postfix+pgsql-2.3.3-2.1.el5_2 on a CentOS 5.4 server. I see what looks likes a server in stress mode as described in http://www.postfix.org/STRESS_README.html except the odd think about it is that the server is not heavily loaded and I sure can't see where it's exceeding any process limits. What's even odder is it doesn't appear that the stress code is implemented in this version. If I telnet to port 25 I get an immediate SMTP greeting followed in 10 seconds by 421 4.4.2 mymail.com Error: timeout exceeded and the connection being closed. The following maillog entry is logged: May 3 16:44:06 mymail postfix/smtpd[22573]: timeout after CONNECT from 173-12-149-200.client.myisp.com[173.12.149.200] This is like this constantly. I see 0-4 smtpd processes on the server at any one time (I'm not sure if it's limited at 4, I just haven't seen more). There are a similar number of policy daemons. There is a "-" in the maxproc field for smtpd in master.cf. From everything I can tell the default is a limit of 100. I do run a policy daemon (vpostmaster). I've changed its maxproc field to 0 per the recommendation in the STRESS_README (and restarted postfix). It's master.cf entry looks like this... vpm-pfpolicy unix - n n - 0 spawn user=vpostmaster argv=/usr/lib/vpostmaster/postfix/vpm-pfpolicy I also get lots of log entries like this for timeouts on the policy daemon: May 2 05:36:20 aspen postfix/spawn[6003]: warning: /usr/lib/vpostmaster/postfix/vpm-pfpolicy: process id 6004: command time limit exceeded and occasionally similar timeouts on the transport daemon (which is part of vpostmaster as well). My system load (which is running on a Vmware ESXi virtual machine) is: 16:59:36 up 2 days, 22:17, 3 users, load average: 0.08, 0.02, 0.01 Then, on outbound mail, I found this one site that adds delays of 18 seconds before the helo message to their SMTP server. My server cannot get a message through to that server at all. I tried adding -o stress= for the smtpd and nothing changed. The system does not automatically add the stress parameter to smtpd if I don't add it myself, so I'm not inclined to believe that Centos/Redhat 5.4 has the stress patch applied. I do not see any slowness in the DNS servers. I have at most 2 RBL lists that I check. I did have limits set on the number of connections and timeouts etc, but I've restored them all to the defaults for purposes of debugging. I get these timeouts even when there's only one smtpd process. Some of my config parameters right now are: ipc_timeout = 3600s command_time_limit = 1000s smtpd_error_sleep_time = 1s smtpd_policy_service_timeout = 100s smtpd_proxy_timeout = 100s smtpd_starttls_timeout = 300s smtpd_timeout = 300s smtpd_tls_session_cache_timeout = 3600s smtpd_client_connection_count_limit = 50 smtpd_client_connection_rate_limit = 0 smtpd_client_message_rate_limit = 0 smtpd_client_new_tls_session_rate_limit = 0 smtpd_client_recipient_rate_limit = 0 smtpd_hard_error_limit = 20 smtpd_junk_command_limit = 100 smtpd_recipient_limit = 1000 smtpd_recipient_overshoot_limit = 1000 smtpd_soft_error_limit = 10 THank you, Nataraj
From: Nataraj on 4 May 2010 08:59 lst_hoe02(a)kwsoft.de wrote: > Zitat von Nataraj <incoming-postfix(a)rjl.com>: > >> Hi, >> >> I would appreciate any suggestions anyone can offer on the following >> problem that I'm having with postfix... >> >> I'm running postfix+pgsql-2.3.3-2.1.el5_2 on a CentOS 5.4 server. I >> see what looks likes a server in stress mode as described in >> http://www.postfix.org/STRESS_README.html except the odd think about >> it is that the server is not heavily loaded and I sure can't see >> where it's exceeding any process limits. What's even odder is it >> doesn't appear that the stress code is implemented in this version. >> >> If I telnet to port 25 I get an immediate SMTP greeting followed in >> 10 seconds by >> 421 4.4.2 mymail.com Error: timeout exceeded >> and the connection being closed. >> The following maillog entry is logged: >> May 3 16:44:06 mymail postfix/smtpd[22573]: timeout after CONNECT >> from 173-12-149-200.client.myisp.com[173.12.149.200] > > Why do you think your Postfix server is "stressed"?? The automatic > stress-dependant features are introduced in version 2.5 as far as i > know so your Postfix does not support -o stress at all. If you only > have 4 smtpd running and your server show greeting immediately when > telneting to port 25 all should be fine. I don't think it is a stress condition. The problem is that it is timing everything out. Further checking shows, it times out inbound SMTP connections in like 3-4 seconds and fails outbound deliveries to slow servers. The transports also timeout and bounce messages. > > Regards > > Andreas >
From: Nataraj on 4 May 2010 11:21 lst_hoe02(a)kwsoft.de wrote: > Zitat von Nataraj <incoming-postfix(a)rjl.com>: > >> lst_hoe02(a)kwsoft.de wrote: >>> Zitat von Nataraj <incoming-postfix(a)rjl.com>: >>> >>>> Hi, >>>> >>>> I would appreciate any suggestions anyone can offer on the >>>> following problem that I'm having with postfix... >>>> >>>> I'm running postfix+pgsql-2.3.3-2.1.el5_2 on a CentOS 5.4 server. >>>> I see what looks likes a server in stress mode as described in >>>> http://www.postfix.org/STRESS_README.html except the odd think >>>> about it is that the server is not heavily loaded and I sure can't >>>> see where it's exceeding any process limits. What's even odder is >>>> it doesn't appear that the stress code is implemented in this version. >>>> >>>> If I telnet to port 25 I get an immediate SMTP greeting followed in >>>> 10 seconds by >>>> 421 4.4.2 mymail.com Error: timeout exceeded >>>> and the connection being closed. >>>> The following maillog entry is logged: >>>> May 3 16:44:06 mymail postfix/smtpd[22573]: timeout after CONNECT >>>> from 173-12-149-200.client.myisp.com[173.12.149.200] >>> >>> Why do you think your Postfix server is "stressed"?? The automatic >>> stress-dependant features are introduced in version 2.5 as far as i >>> know so your Postfix does not support -o stress at all. If you only >>> have 4 smtpd running and your server show greeting immediately when >>> telneting to port 25 all should be fine. >> >> I don't think it is a stress condition. The problem is that it is >> timing everything out. Further checking shows, it times out inbound >> SMTP connections in like 3-4 seconds and fails outbound deliveries to >> slow servers. The transports also timeout and bounce messages. > > Sorry i still don't understand what the problem is... > Are you concerned about "timeout after SOMETHING" messages in the logs? > Do you have problems with mails not arriving? > Do you have problems with mails not leaving? > > Please try to explain, maybe with log data for a problematic mail. > > Regards > > Andreas > "fails outbound deliveries to slow servers." means there are problems with outbound mail deliveries. Yes there are complaints of people sending mail from the outside whose delivery is either delayed or does not get through. This means there are problems with inbound mail. An SMTP server that times out after 4 seconds of inactivity is not reasonable given the possibility of delays in others mail systems and on the internet. The question that I am asking is what else can cause these timeouts if there is no stress code in my current version of postfix? Nataraj
From: Charles Gregory on 4 May 2010 11:26 On Tue, 4 May 2010, Nataraj wrote: >> > > > If I telnet to port 25 I get an immediate SMTP greeting followed in >> > > > 10 seconds by >> > > > 421 4.4.2 mymail.com Error: timeout exceeded Firstly, this is meaningless. All SMTP connections will time out quickly if no initial commands are received. If you really want to test your machine's timeout, enter MAI FROM and RCPT TO and DATA, send at least one line, then wait.... > Yes there are complaints of people sending mail from the outside whose > delivery is either delayed or does not get through. This means there are > problems with inbound mail. An SMTP server that times out after 4 > seconds of inactivity is not reasonable given the possibility of delays > in others mail systems and on the internet. It is possible that the timeout is NOT in postfix. Please post some logs shownig the actual 'timeout' error message. - C
From: Nataraj on 4 May 2010 12:20
lst_hoe02(a)kwsoft.de wrote: > Zitat von Nataraj <incoming-postfix(a)rjl.com>: > >> lst_hoe02(a)kwsoft.de wrote: >>> Zitat von Nataraj <incoming-postfix(a)rjl.com>: >>> >>>> lst_hoe02(a)kwsoft.de wrote: >>>>> Zitat von Nataraj <incoming-postfix(a)rjl.com>: >>>>> >>>>>> Hi, >>>>>> >>>>>> I would appreciate any suggestions anyone can offer on the >>>>>> following problem that I'm having with postfix... >>>>>> >>>>>> I'm running postfix+pgsql-2.3.3-2.1.el5_2 on a CentOS 5.4 >>>>>> server. I see what looks likes a server in stress mode as >>>>>> described in http://www.postfix.org/STRESS_README.html except the >>>>>> odd think about it is that the server is not heavily loaded and I >>>>>> sure can't see where it's exceeding any process limits. What's >>>>>> even odder is it doesn't appear that the stress code is >>>>>> implemented in this version. >>>>>> >>>>>> If I telnet to port 25 I get an immediate SMTP greeting followed >>>>>> in 10 seconds by >>>>>> 421 4.4.2 mymail.com Error: timeout exceeded >>>>>> and the connection being closed. >>>>>> The following maillog entry is logged: >>>>>> May 3 16:44:06 mymail postfix/smtpd[22573]: timeout after CONNECT >>>>>> from 173-12-149-200.client.myisp.com[173.12.149.200] >>>>> >>>>> Why do you think your Postfix server is "stressed"?? The automatic >>>>> stress-dependant features are introduced in version 2.5 as far as >>>>> i know so your Postfix does not support -o stress at all. If you >>>>> only have 4 smtpd running and your server show greeting >>>>> immediately when telneting to port 25 all should be fine. >>>> >>>> I don't think it is a stress condition. The problem is that it is >>>> timing everything out. Further checking shows, it times out >>>> inbound SMTP connections in like 3-4 seconds and fails outbound >>>> deliveries to slow servers. The transports also timeout and bounce >>>> messages. >>> >>> Sorry i still don't understand what the problem is... >>> Are you concerned about "timeout after SOMETHING" messages in the logs? >>> Do you have problems with mails not arriving? >>> Do you have problems with mails not leaving? >>> >>> Please try to explain, maybe with log data for a problematic mail. >>> >>> Regards >>> >>> Andreas >>> >> "fails outbound deliveries to slow servers." means there are problems >> with outbound mail deliveries. Yes there are complaints of people >> sending mail from the outside whose delivery is either delayed or >> does not get through. This means there are problems with inbound >> mail. An SMTP server that times out after 4 seconds of inactivity is >> not reasonable given the possibility of delays in others mail systems >> and on the internet. > > Please show the relevant log entries for a failed delivery attempt > from connect until disconnect. Postfix has reasonable defaults and for > sure no 4s timeout. > >> The question that I am asking is what else can cause these timeouts >> if there is no stress code in my current version of postfix? > > The smtpd_<mumble>_timeout settings are relevant on postfix side. The > default for smtpd is 300s. If your settings are at default or far from > the values you are seeing you should check if some stateful firewall > in between drops "inactive" connections earlier. If this does not help > you should capture a tcpdump to see what is going on. > > Regards > > Andreas > Hi Andreas, I did post my configuration parameters in my first message. I mentioned that I had changed them, but had restored the defaults to see if that would solve the problem. The smtpd_timeout was never changed from 300s. Here is a log of a failed outbound delivery. As I mentioned, the sysadmin for this server told me they have an 18 second delay before sending the greeting, which they've found effective at stopping spambots. I've done crazier things myself, and I think that a properly working mailserver should be able to deliver to them even if it is a bit unfriendly. Certainly I was able to deliver to them before this problem started. May 2 05:22:32 aspen postfix/smtp[6575]: 4CF471D0200: to=<xxxxxx(a)cs.rutgers.edu>, relay=mail.cs.rutgers.edu[128.6.4.3]:25, delay=124076, delays=124072/0.05/4\ ..3/0, dsn=4.4.2, status=deferred (conversation with mail.cs.rutgers.edu[128.6.4.3] timed out while receiving the initial server greeting) It most definitely is valid to telnet to an smtp server and type commands to test it. I've been doing this for years. It was only until this problem came up that I am no longer able to do it because I can't type fast enough before it times me out (unless I enable pipelining and paste the commands). I maintain the firewall and there are no logs of problems there. I'm fairly certain this is the SMTP server. If a firewall was timing out the connection postfix would not know that it was a timeout and would not be able to log a timeout error. penguin> telnet aspen 25 Trying 66.35.48.14... Connected to aspen. Escape character is '^]'. 220 aspen.rjl.com ESMTP vPostMaster 421 4.4.2 aspen.rjl.com Error: timeout exceeded Connection closed by foreign host. From this log of the telnet session, the 421 error comes from the smtp server and running tcpdump is useless because the problem is at the SMTP level and not at the TCP or IP level. Thank you. I appreciate your efforts to help. I guess I will have to dig into the code and see if I can figure out what is going on, but I was hoping somebody else might have seen a problem like this that would save me the efforts of debugging the code. I may try upgrading to a more recent release of postfix first. Nataraj |