Prev: suppress NDRs from spoofed sender
Next: Exclude recipients from pre-queue spamass-milter checks?
From: Wietse Venema on 21 Jan 2010 10:30 It's nice for a change to work with someone who provides actual information, instead of that dork from last week who was just venting his opinions and not giving people a chance to help. In this case it looks like a Solaris TCP bug when sending data over a connection that suffers from heavy packet loss. Apparently, Solaris 10 stops retransmitting, and therefore the other side times out. I'm not wrapping the packets this time, so you will have to widen your window to read this message comfortably. Wietse The three-way TCP handshake show no wscale options, so we don't have to worry about borked firewalls mis-handling this: 09:16:40.786945 IP 213.207.90.2.59301 > 145.222.14.10.25: S 2466228028:2466228028(0) win 49640 <mss 1460,nop,nop,sackOK> 09:16:40.789806 IP 145.222.14.10.25 > 213.207.90.2.59301: S 3886146351:3886146351(0) ack 2466228029 win 5840 <mss 1380> 09:16:40.789829 IP 213.207.90.2.59301 > 145.222.14.10.25: . ack 1 win 49680 The handshake shows a round-trip time of 2.9ms. The receiver's MTU is smaller than 1460, which suggests that his packets are encapsulated in some other protocol. It does not matter for the problem at hand. Next is a segment from the middle of transmission. The sender has sent a full window up to byte 22233, but apparently there was major packet loss after byte 5673. 09:16:51.869847 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560 09:16:51.869861 IP 213.207.90.2.59301 > 145.222.14.10.25: P 20853:22233(1380) ack 137 win 49680 09:16:51.869874 IP 213.207.90.2.59301 > 145.222.14.10.25: . 5673:7053(1380) ack 137 win 49680 09:16:51.869976 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560 09:16:51.869977 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560 09:16:51.869978 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560 09:16:51.870110 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560 09:16:51.870111 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560 09:16:51.870376 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560 09:16:56.216513 IP 213.207.90.2.59301 > 145.222.14.10.25: . 5673:7053(1380) ack 137 win 49680 09:16:56.260383 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 7053 win 19320 09:16:56.260414 IP 213.207.90.2.59301 > 145.222.14.10.25: . 7053:8433(1380) ack 137 win 49680 09:16:56.260421 IP 213.207.90.2.59301 > 145.222.14.10.25: . 8433:9813(1380) ack 137 win 49680 09:16:56.263577 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 8433 win 22080 09:16:56.263588 IP 213.207.90.2.59301 > 145.222.14.10.25: . 9813:11193(1380) ack 137 win 49680 09:16:56.263610 IP 213.207.90.2.59301 > 145.222.14.10.25: . 11193:12573(1380) ack 137 win 49680 09:16:56.263844 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 9813 win 24840 09:16:56.263855 IP 213.207.90.2.59301 > 145.222.14.10.25: . 12573:13953(1380) ack 137 win 49680 09:16:56.263865 IP 213.207.90.2.59301 > 145.222.14.10.25: . 13953:15333(1380) ack 137 win 49680 09:16:56.266641 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 9813 win 24840 09:16:56.266776 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 11193 win 28980 09:16:56.266777 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 11193 win 28980 09:16:56.266800 IP 213.207.90.2.59301 > 145.222.14.10.25: . 15333:16713(1380) ack 137 win 49680 Sequences like this repeat through the entire session. This network connection is so bad that your machine can send only 107656 bytes in 85 seconds or 1.27 kbyte/s. I notice there are many sequences like this: 09:16:51.869844 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560 09:16:51.869845 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560 09:16:51.869846 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560 09:16:51.869847 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560 and this: 09:16:51.869976 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560 09:16:51.869977 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560 09:16:51.869978 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560 and this: 09:17:20.597737 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 40173 win 32767 09:17:20.597738 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 40173 win 32767 09:17:20.597739 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 40173 win 32767 And many more. This is consistent with network congestion. These identical ACKs would have been sent several ms separated from each other (remember, the round-trip time is about 3ms). Apparently the ACKs were queued up behind other data that was clogging some pipe somewere, and so they ended up traveling together. Network congestion is also consistent with increased packet loss. And now for the surprise: after 09:18:16.515592 the sender stops retransmitting. That would be a sender TCP bug. The first thing that happens next is 30s later, at 09:18:46.512567, when the server reports a 421 timeout error and closes the connection. 09:18:16.512400 IP 213.207.90.2.59301 > 145.222.14.10.25: . 106413:107793(1380) ack 137 win 49680 09:18:16.512408 IP 213.207.90.2.59301 > 145.222.14.10.25: . 107793:109173(1380) ack 137 win 49680 09:18:16.512627 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 106413 win 32767 09:18:16.512644 IP 213.207.90.2.59301 > 145.222.14.10.25: . 109173:110553(1380) ack 137 win 49680 09:18:16.515557 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 107793 win 32767 09:18:16.515592 IP 213.207.90.2.59301 > 145.222.14.10.25: P 110553:111933(1380) ack 137 win 49680 09:18:16.515823 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 107793 win 32767 09:18:16.518354 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 107793 win 32767 [nothing for 30s] 09:18:46.512567 IP 145.222.14.10.25 > 213.207.90.2.59301: P 137:188(51) ack 107793 win 32767 09:18:46.513098 IP 145.222.14.10.25 > 213.207.90.2.59301: F 188:188(0) ack 107793 win 32767 The sender should not stop retransmitting like this. You may want to provide the complete trace to the Solaris maintainers.
From: Martijn de Munnik on 21 Jan 2010 11:18 On Thu, 21 Jan 2010 10:30:54 -0500 (EST), wietse(a)porcupine.org (Wietse Venema) wrote: > It's nice for a change to work with someone who provides actual > information, instead of that dork from last week who was just > venting his opinions and not giving people a chance to help. I don't know how one can help me if I don't supply info ;) > > In this case it looks like a Solaris TCP bug when sending data over > a connection that suffers from heavy packet loss. Apparently, > Solaris 10 stops retransmitting, and therefore the other side times > out. Ok, this is useful help. I did reroute the message via a linux postfix server and now it is successfully delivered. I try to find out if I can solve the Solaris bug but at least our customer is happy now! > > I'm not wrapping the packets this time, so you will have to > widen your window to read this message comfortably. With your explanation I'm able to read the tcp dump, one day I will fully understand it... > > Wietse > > The three-way TCP handshake show no wscale options, so we don't > have to worry about borked firewalls mis-handling this: > > 09:16:40.786945 IP 213.207.90.2.59301 > 145.222.14.10.25: S > 2466228028:2466228028(0) win 49640 <mss 1460,nop,nop,sackOK> > 09:16:40.789806 IP 145.222.14.10.25 > 213.207.90.2.59301: S > 3886146351:3886146351(0) ack 2466228029 win 5840 <mss 1380> > 09:16:40.789829 IP 213.207.90.2.59301 > 145.222.14.10.25: . ack 1 win 49680 > > The handshake shows a round-trip time of 2.9ms. The receiver's MTU > is smaller than 1460, which suggests that his packets are encapsulated > in some other protocol. It does not matter for the problem at hand. > > Next is a segment from the middle of transmission. The sender has > sent a full window up to byte 22233, but apparently there was major > packet loss after byte 5673. > > 09:16:51.869847 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.869861 IP 213.207.90.2.59301 > 145.222.14.10.25: P > 20853:22233(1380) ack 137 win 49680 > 09:16:51.869874 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 5673:7053(1380) ack 137 win 49680 > 09:16:51.869976 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.869977 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.869978 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.870110 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.870111 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.870376 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:56.216513 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 5673:7053(1380) ack 137 win 49680 > 09:16:56.260383 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 7053 win > 19320 > 09:16:56.260414 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 7053:8433(1380) ack 137 win 49680 > 09:16:56.260421 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 8433:9813(1380) ack 137 win 49680 > 09:16:56.263577 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 8433 win > 22080 > 09:16:56.263588 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 9813:11193(1380) ack 137 win 49680 > 09:16:56.263610 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 11193:12573(1380) ack 137 win 49680 > 09:16:56.263844 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 9813 win > 24840 > 09:16:56.263855 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 12573:13953(1380) ack 137 win 49680 > 09:16:56.263865 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 13953:15333(1380) ack 137 win 49680 > 09:16:56.266641 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 9813 win > 24840 > 09:16:56.266776 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 11193 win > 28980 > 09:16:56.266777 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 11193 win > 28980 > 09:16:56.266800 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 15333:16713(1380) ack 137 win 49680 > > Sequences like this repeat through the entire session. This network > connection is so bad that your machine can send only 107656 bytes > in 85 seconds or 1.27 kbyte/s. > > I notice there are many sequences like this: > > 09:16:51.869844 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.869845 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.869846 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.869847 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > > and this: > > 09:16:51.869976 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.869977 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.869978 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > > and this: > > 09:17:20.597737 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 40173 win > 32767 > 09:17:20.597738 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 40173 win > 32767 > 09:17:20.597739 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 40173 win > 32767 > > And many more. This is consistent with network congestion. These > identical ACKs would have been sent several ms separated from each > other (remember, the round-trip time is about 3ms). Apparently the > ACKs were queued up behind other data that was clogging some pipe > somewere, and so they ended up traveling together. > > Network congestion is also consistent with increased packet loss. > > And now for the surprise: after 09:18:16.515592 the sender stops > retransmitting. That would be a sender TCP bug. > > The first thing that happens next is 30s later, at 09:18:46.512567, > when the server reports a 421 timeout error and closes the connection. > > 09:18:16.512400 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 106413:107793(1380) ack 137 win 49680 > 09:18:16.512408 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 107793:109173(1380) ack 137 win 49680 > 09:18:16.512627 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 106413 win > 32767 > 09:18:16.512644 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 109173:110553(1380) ack 137 win 49680 > 09:18:16.515557 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 107793 win > 32767 > 09:18:16.515592 IP 213.207.90.2.59301 > 145.222.14.10.25: P > 110553:111933(1380) ack 137 win 49680 > 09:18:16.515823 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 107793 win > 32767 > 09:18:16.518354 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 107793 win > 32767 > [nothing for 30s] > 09:18:46.512567 IP 145.222.14.10.25 > 213.207.90.2.59301: P 137:188(51) > ack 107793 win 32767 > 09:18:46.513098 IP 145.222.14.10.25 > 213.207.90.2.59301: F 188:188(0) ack > 107793 win 32767 > > The sender should not stop retransmitting like this. You may want > to provide the complete trace to the Solaris maintainers. -- YoungGuns Kasteleinenkampweg 7b 5222 AX 's-Hertogenbosch T. 073 623 56 40 F. 073 623 56 39 www.youngguns.nl KvK 18076568
First
|
Prev
|
Pages: 1 2 3 Prev: suppress NDRs from spoofed sender Next: Exclude recipients from pre-queue spamass-milter checks? |