Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 Jun 2012 21:40:07 +0000 (UTC)
From:      cjharrer@comcast.net
To:        freebsd-net@freebsd.org
Subject:   Re: Window updates during periods of HIGH packet loss
Message-ID:  <930457657.42273.1339623607713.JavaMail.root@sz0092a.westchester.pa.mail.comcast.net>
In-Reply-To: <2047697620.40515.1339621377861.JavaMail.root@sz0092a.westchester.pa.mail.comcast.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Oops, left one part out below... 


When snd_nxt == snd_una == snd_max; snd_wnd == 0 which is why we can't send any new data. 

----- Original Message -----
From: cjharrer@comcast.net 
To: freebsd-net@freebsd.org 
Cc: "Christopher J. Harrer" <cjharrer@comcast.net> 
Sent: Wednesday, June 13, 2012 5:02:57 PM 
Subject: Window updates during periods of HIGH packet loss 



Running FreeBSD 8.0 Stable and we're running into an issue with 
window updates during periods of very high DUPLEX network 
traffic where there are a good number of network packets being 
dropped. Please let me know if there is a better list to ask this 
question of. 


I'm going to use some small, made up numbers to demonstrate what is 
going on. I can go into more detail with explicit numbers from an 
internal trace that I created, but it gets pretty long and tedious, so 
I'd like to see if my example makes sense first. 


We have a server that is running a lot of NFS traffic (NFSv3 over 
TCP/IPv4) to a NetApp back-end filer (not sure the filer matters). 


For the purpose of this problem description, let's assume that I have 
a send and receive window of 10,000. 


We have a lot of data outstanding in the network (let's say 9000 bytes), 
the back-end filer (seq 1,000 to 9,999). The filer is sending us a lot 
of data concurrent to our 9000 bytes we just sent. Lets assume our 
rcv_nxt is 1,000. 


We receive th_seq of 10,000 (out of order) from the filer and it 
ACK's all of our oustanding data. So, snd_wl1 becomes 10,000 and 
snd_wl2 becomes 9,999. Our snd_wnd is now 10000, so we begin to send 
new data (again, we blast it out, so let's assume we have 10,000 more 
bytes sent). 


The filer is "resending" sequencne numbers 1,000 through 9,999 because the 
new data we are sending contains SACK blocks instructing it to. The 
retransmitted data we are receiving is also acking our new sent data such 
that when we receive segment with th_seq 9,000 it goes up to 9999 (and 
completes our out of order processing) all of our data is acked. Now, 
here's where the problem arises: 


1) in processing a WindowUpdate (step6 in tcp_input) the 2nd check that is 
made is to ensure that tp->snd_wl1 < th->th_seq, in this case, it's not. 
10,000 is not less than 9,000. The next check needs th->th_seq == 
tp->snd_wl1 which also fails, so no window update done. 


2) After tcp_reass handles the receipt of the last segment that fills in the 
"hole" in our stream, tp->t_flags |= TF_ACKNOW (this flag cause tcp_output 
to skip the check to start the PERSIST timer, because it must force 
a send (in this case, the send is just an ACK). Any time tcp_reass 
returns TF_ACKNOW is set. 


We've gotten a new send down while we were sending data into our open window, 
so now we're stuck, tp->snd_nxt == tp->snd_una == tp->snd_max 
and so_snd.sb_cc !=0, TT_PERSIST is NOT running and TT_REXMT is not running. 


Eventually the filer sends us a FIN to close an "idle" client 
connection; which is normal operation in this configuration. 


I have not looked at more recent versions of FreeBSD code yet, I will start 
doing that now. I just wanted to ask the experts if I'm missing something 
here, it feels like I am. 


Thanks in advance for any insight you can provide. 


Regards, 


Chris 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?930457657.42273.1339623607713.JavaMail.root>