From owner-freebsd-net@FreeBSD.ORG Wed Jun 13 21:03:00 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (unknown [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9883F106580D for ; Wed, 13 Jun 2012 21:03:00 +0000 (UTC) (envelope-from cjharrer@comcast.net) Received: from qmta08.westchester.pa.mail.comcast.net (qmta08.westchester.pa.mail.comcast.net [76.96.62.80]) by mx1.freebsd.org (Postfix) with ESMTP id 424F58FC08 for ; Wed, 13 Jun 2012 21:02:58 +0000 (UTC) Received: from omta16.westchester.pa.mail.comcast.net ([76.96.62.88]) by qmta08.westchester.pa.mail.comcast.net with comcast id Mqrs1j00E1uE5Es58x2xaF; Wed, 13 Jun 2012 21:02:57 +0000 Received: from sz0092.wc.mail.comcast.net ([76.96.58.150]) by omta16.westchester.pa.mail.comcast.net with comcast id Mx2x1j00S3EUZNL3cx2xnS; Wed, 13 Jun 2012 21:02:57 +0000 Date: Wed, 13 Jun 2012 21:02:57 +0000 (UTC) From: cjharrer@comcast.net To: freebsd-net@freebsd.org Message-ID: <2047697620.40515.1339621377861.JavaMail.root@sz0092a.westchester.pa.mail.comcast.net> MIME-Version: 1.0 X-Originating-IP: [68.80.185.133] X-Mailer: Zimbra 6.0.13_GA_2944 (ZimbraWebClient - SAF3 (Win)/6.0.13_GA_2944) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "Christopher J. Harrer" Subject: Window updates during periods of HIGH packet loss X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Jun 2012 21:03:00 -0000 Running FreeBSD 8.0 Stable and we're running into an issue with window updates during periods of very high DUPLEX network traffic where there are a good number of network packets being dropped. Please let me know if there is a better list to ask this question of. I'm going to use some small, made up numbers to demonstrate what is going on. I can go into more detail with explicit numbers from an internal trace that I created, but it gets pretty long and tedious, so I'd like to see if my example makes sense first. We have a server that is running a lot of NFS traffic (NFSv3 over TCP/IPv4) to a NetApp back-end filer (not sure the filer matters). For the purpose of this problem description, let's assume that I have a send and receive window of 10,000. We have a lot of data outstanding in the network (let's say 9000 bytes), the back-end filer (seq 1,000 to 9,999). The filer is sending us a lot of data concurrent to our 9000 bytes we just sent. Lets assume our rcv_nxt is 1,000. We receive th_seq of 10,000 (out of order) from the filer and it ACK's all of our oustanding data. So, snd_wl1 becomes 10,000 and snd_wl2 becomes 9,999. Our snd_wnd is now 10000, so we begin to send new data (again, we blast it out, so let's assume we have 10,000 more bytes sent). The filer is "resending" sequencne numbers 1,000 through 9,999 because the new data we are sending contains SACK blocks instructing it to. The retransmitted data we are receiving is also acking our new sent data such that when we receive segment with th_seq 9,000 it goes up to 9999 (and completes our out of order processing) all of our data is acked. Now, here's where the problem arises: 1) in processing a WindowUpdate (step6 in tcp_input) the 2nd check that is made is to ensure that tp->snd_wl1 < th->th_seq, in this case, it's not. 10,000 is not less than 9,000. The next check needs th->th_seq == tp->snd_wl1 which also fails, so no window update done. 2) After tcp_reass handles the receipt of the last segment that fills in the "hole" in our stream, tp->t_flags |= TF_ACKNOW (this flag cause tcp_output to skip the check to start the PERSIST timer, because it must force a send (in this case, the send is just an ACK). Any time tcp_reass returns TF_ACKNOW is set. We've gotten a new send down while we were sending data into our open window, so now we're stuck, tp->snd_nxt == tp->snd_una == tp->snd_max and so_snd.sb_cc !=0, TT_PERSIST is NOT running and TT_REXMT is not running. Eventually the filer sends us a FIN to close an "idle" client connection; which is normal operation in this configuration. I have not looked at more recent versions of FreeBSD code yet, I will start doing that now. I just wanted to ask the experts if I'm missing something here, it feels like I am. Thanks in advance for any insight you can provide. Regards, Chris