Prev: suppress NDRs from spoofed sender
Next: Exclude recipients from pre-queue spamass-milter checks?
From: Martijn de Munnik on 20 Jan 2010 11:50 On Wed, 20 Jan 2010 07:20:01 -0500 (EST), wietse(a)porcupine.org (Wietse Venema) wrote: > Martijn de Munnik: >> Hi list, >> >> I have a problem with delivering mail to a host and get this error: >> >> host mx2.amsterdam.nl[145.222.14.10] said: 421 enepmx02.amsterdam.nl >> Error: timeout exceeded (in reply to end of DATA command) >> >> This error only seems to occur with 'large' mails. Currently I have a >> of ~600KB and ~8MB stuck in the queue. I don't think this is a postfix >> issue on our site but an issue with the mailserver on the other site. >> What >> can cause such issues? > > Record a tcpdump trace. The way the session fails will indicate > the kind of problem (MTU, Window scaling, and so on). > > http://www.postfix.org/DEBUG_README.html > > Wietse Ok, I tried that and I'm not really sure where to look for. I opened the tcpdump file in wireshark and there are a lot of warnings and notes in the file. -- Notes: Duplicate ACK(#1) [145.222.14.10 -> 213.207.90.2] Duplicate ACK(#2) [145.222.14.10 -> 213.207.90.2] Duplicate ACK(#3) [145.222.14.10 -> 213.207.90.2] Duplicate ACK(#4) [145.222.14.10 -> 213.207.90.2] .. .. .. Duplicate ACK(#44) [145.222.14.10 -> 213.207.90.2] Retransmission (suspected) [213.207.90.2 -> 145.222.14.10] Warnings: Fast retransmission (suspected) [213.207.90.2 -> 145.222.14.10] Out-Of-Order segment [213.207.90.2 -> 145.222.14.10] -- This is abracadabra for me ;) Martijn -- YoungGuns Kasteleinenkampweg 7b 5222 AX 's-Hertogenbosch T. 073 623 56 40 F. 073 623 56 39 www.youngguns.nl KvK 18076568
From: Wietse Venema on 20 Jan 2010 12:40 Martijn de Munnik: > On Wed, 20 Jan 2010 07:20:01 -0500 (EST), wietse(a)porcupine.org (Wietse > Venema) wrote: > > Martijn de Munnik: > >> Hi list, > >> > >> I have a problem with delivering mail to a host and get this error: > >> > >> host mx2.amsterdam.nl[145.222.14.10] said: 421 enepmx02.amsterdam.nl > >> Error: timeout exceeded (in reply to end of DATA command) > >> > >> This error only seems to occur with 'large' mails. Currently I have a > >> of ~600KB and ~8MB stuck in the queue. I don't think this is a postfix > >> issue on our site but an issue with the mailserver on the other site. > >> What > >> can cause such issues? > > > > Record a tcpdump trace. The way the session fails will indicate > > the kind of problem (MTU, Window scaling, and so on). > > > > http://www.postfix.org/DEBUG_README.html > > > > Wietse > > Ok, I tried that and I'm not really sure where to look for. I opened the > tcpdump file in wireshark and there are a lot of warnings and notes in the > file. > > -- > Notes: > Duplicate ACK(#1) [145.222.14.10 -> 213.207.90.2] > Duplicate ACK(#2) [145.222.14.10 -> 213.207.90.2] > Duplicate ACK(#3) [145.222.14.10 -> 213.207.90.2] > Duplicate ACK(#4) [145.222.14.10 -> 213.207.90.2] > . > . > . > Duplicate ACK(#44) [145.222.14.10 -> 213.207.90.2] > Retransmission (suspected) [213.207.90.2 -> 145.222.14.10] > > Warnings: > Fast retransmission (suspected) [213.207.90.2 -> 145.222.14.10] > Out-Of-Order segment [213.207.90.2 -> 145.222.14.10] > -- > > This is abracadabra for me ;) If you can make the "tcpdump -nr /file/name" output available then people who understand TCP/IP can look at it. Wietse
From: Wietse Venema on 20 Jan 2010 15:22 Here's the TCP initial handshake: 17:30:44.951789 IP 213.207.90.2.48147 > 145.222.14.10.25: S 50514820:50514820(0) win 49640 <mss 1460,nop,wscale 0,nop,nop,sackOK> 17:30:44.954496 IP 145.222.14.10.25 > 213.207.90.2.48147: S 4148480248:4148480248(0) ack 50514821 win 5840 <mss 1380,nop,wscale 2> 17:30:44.954519 IP 213.207.90.2.48147 > 145.222.14.10.25: . ack 1 win 49680 Later, as the receiver processes the network packets, it acknowledges the data received, sends its receive window size (how much more it is willing to receive). Above, with "wscale 2" the server at 145.222.14.10 announces that its TCP receive window value needs to be multiplied by a factor of 4 (binary number shifted left by 2). But, there is a broken router in the path that does not understand window scaling. Here is an example of what gets f-ed up: 17:30:45.412222 IP 213.207.90.2.48147 > 145.222.14.10.25: . 20853:22233(1380) ack 137 win 49680 17:30:45.412230 IP 213.207.90.2.48147 > 145.222.14.10.25: . 22233:23613(1380) ack 137 win 49680 17:30:45.412249 IP 213.207.90.2.48147 > 145.222.14.10.25: P 23613:24993(1380) ack 137 win 49680 17:30:45.412747 IP 145.222.14.10.25 > 213.207.90.2.48147: P ack 8433 win 5800 17:30:45.412748 IP 145.222.14.10.25 > 213.207.90.2.48147: P ack 8433 win 5800 17:30:45.412749 IP 145.222.14.10.25 > 213.207.90.2.48147: P ack 8433 win 5800 The receiver says they can receive bytes 8433-31633, but the broken router does not know that 5800 needs to be multiplied by 4, and it thinks the receiver can receive only bytes 8433-14233. The broken router then throws away the bytes with higher sequence numbers than 14233. Workaround: turn off window scaling support on the sender's kernel. Wietse
From: Victor Duchovni on 20 Jan 2010 15:28 On Wed, Jan 20, 2010 at 03:22:56PM -0500, Wietse Venema wrote: > The broken router then throws away the bytes with higher sequence > numbers than 14233. > > Workaround: turn off window scaling support on the sender's kernel. This problem is sufficiently common, that on Linux MTAs I always add: net.ipv4.tcp_window_scaling = 0 to sysctl.conf. Adjust for other systems as necessary. This hurts long-haul throughput, but email tolerates latency, provided most of your outbound traffic is not a high-bandwidth channel to Mars (but then you would not be using TCP anyway...) -- Viktor. Disclaimer: off-list followups get on-list replies or get ignored. Please do not ignore the "Reply-To" header. To unsubscribe from the postfix-users list, visit http://www.postfix.org/lists.html or click the link below: <mailto:majordomo(a)postfix.org?body=unsubscribe%20postfix-users> If my response solves your problem, the best way to thank me is to not send an "it worked, thanks" follow-up. If you must respond, please put "It worked, thanks" in the "Subject" so I can delete these quickly.
From: Wietse Venema on 20 Jan 2010 20:22 Wietse Venema: > You can do > > ndd /dev/tcp \? > > to find out what parameters are supported. On my Solaris9 and > Solaris10 test boxes it is called tcp_wscale_always. > > According to Solaris10 documentation: > > When this parameter is enabled, which is the default setting > [since Solaris10], TCP always sends a SYN segment with the > window scale option, even if the window scale option value is > 0. With the default tcp_wscale_always setting, making a connection from a Solaris 10 box to FreeBSD 8.0: 20:13:59.808828 IP 168.100.189.17.32799 > 168.100.189.10.25: Flags [S], seq 118377775, win 49640, options [mss 1460,nop,wscale 0,nop,nop,sackOK], length 0 20:13:59.808892 IP 168.100.189.10.25 > 168.100.189.17.32799: Flags [S.], seq 538094055, ack 118377776, win 65535, options [mss 1460,nop,wscale 3,sackOK,eol], length 0 20:13:59.809327 IP 168.100.189.17.32799 > 168.100.189.10.25: Flags [.], ack 1, win 49640, length 0 Same system with tcp_wscale_always set to zero: 20:14:52.736959 IP 168.100.189.17.32800 > 168.100.189.10.25: Flags [S], seq 131413865, win 49640, options [mss 1460,nop,nop,sackOK], length 0 20:14:52.737016 IP 168.100.189.10.25 > 168.100.189.17.32800: Flags [S.], seq 3072042607, ack 131413866, win 65535, options [mss 1460,sackOK,eol], length 0 20:14:52.737581 IP 168.100.189.17.32800 > 168.100.189.10.25: Flags [.], ack 1, win 49640, length 0 Thus, Solaris 10 does not send wscale, and neither should the remote server. If this does not make your mail move, then you need to collect another tcpdump recording. In that case mail was not moving because of multiple problems. Wietse
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 Prev: suppress NDRs from spoofed sender Next: Exclude recipients from pre-queue spamass-milter checks? |