Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 28 Oct 2012 19:16:23 +0000 (UTC)
From:      Andre Oppermann <andre@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   svn commit: r242262 - head/sys/netinet
Message-ID:  <201210281916.q9SJGNwb080550@svn.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: andre
Date: Sun Oct 28 19:16:22 2012
New Revision: 242262
URL: http://svn.freebsd.org/changeset/base/242262

Log:
  Simplify and enhance the window change/update acceptance logic,
  especially in the presence of bi-directional data transfers.
  
  snd_wl1 tracks the right edge, including data in the reassembly
  queue, of valid incoming data.  This makes it like rcv_nxt plus
  reassembly.  It never goes backwards to prevent older, possibly
  reordered segments from updating the window.
  
  snd_wl2 tracks the left edge of sent data.  This makes it a duplicate
  of snd_una.  However joining them right now is difficult due to
  separate update dependencies in different places in the code flow.
  
  snd_wnd tracks the current advertized send window by the peer.  In
  tcp_output() the effective window is calculated by subtracting the
  already in-flight data, snd_nxt less snd_una, from it.
  
  ACK's become the main clock of window updates and will always update
  the window when the left edge of what we sent is advanced.  The ACK
  clock is the primary signaling mechanism in ongoing data transfers.
  This works reliably even in the presence of reordering, reassembly
  and retransmitted segments.  The ACK clock is most important because
  it determines how much data we are allowed to inject into the network.
  
  Zero window updates get us out of persistence mode are crucial.  Here
  a segment that neither moves ACK nor SEQ but enlarges WND is accepted.
  
  When the ACK clock is not active (that is we're not or no longer
  sending any data) any segment that moves the extended right SEQ edge,
  including out-of-order segments, updates the window.  This gives us
  updates especially during ping-pong transfers where the peer isn't
  done consuming the already acknowledged data from the receive buffer
  while responding with data.
  
  The SSH protocol is a prime candidate to benefit from the improved
  bi-directional window update logic as it has its own windowing
  mechanism on top of TCP and is frequently sending back protocol ACK's.
  
  Tcpdump provided by:	darrenr
  Tested by:	darrenr
  MFC after:	2 weeks

Modified:
  head/sys/netinet/tcp_input.c

Modified: head/sys/netinet/tcp_input.c
==============================================================================
--- head/sys/netinet/tcp_input.c	Sun Oct 28 19:02:07 2012	(r242261)
+++ head/sys/netinet/tcp_input.c	Sun Oct 28 19:16:22 2012	(r242262)
@@ -1714,7 +1714,7 @@ tcp_do_segment(struct mbuf *m, struct tc
 			 * Pull snd_wl1 up to prevent seq wrap relative to
 			 * th_seq.
 			 */
-			tp->snd_wl1 = th->th_seq;
+			tp->snd_wl1 = th->th_seq + tlen;
 			/*
 			 * Pull rcv_up up to prevent seq wrap relative to
 			 * rcv_nxt.
@@ -2327,7 +2327,6 @@ tcp_do_segment(struct mbuf *m, struct tc
 		if (tlen == 0 && (thflags & TH_FIN) == 0)
 			(void) tcp_reass(tp, (struct tcphdr *)0, 0,
 			    (struct mbuf *)0);
-		tp->snd_wl1 = th->th_seq - 1;
 		/* FALLTHROUGH */
 
 	/*
@@ -2638,12 +2637,10 @@ process_ACK:
 
 		SOCKBUF_LOCK(&so->so_snd);
 		if (acked > so->so_snd.sb_cc) {
-			tp->snd_wnd -= so->so_snd.sb_cc;
 			sbdrop_locked(&so->so_snd, (int)so->so_snd.sb_cc);
 			ourfinisacked = 1;
 		} else {
 			sbdrop_locked(&so->so_snd, acked);
-			tp->snd_wnd -= acked;
 			ourfinisacked = 0;
 		}
 		/* NB: sowwakeup_locked() does an implicit unlock. */
@@ -2733,24 +2730,56 @@ step6:
 	INP_WLOCK_ASSERT(tp->t_inpcb);
 
 	/*
-	 * Update window information.
-	 * Don't look at window if no ACK: TAC's send garbage on first SYN.
+	 * Window update acceptance logic.  We have to be careful not
+	 * to accept window updates from old segments in the presence
+	 * of reordering or duplication.
+	 *
+	 * A window update is valid when:
+	 *  - the segment ACK's new data.
+	 *  - the segment carries new data and its ACK is current.
+	 *  - the segment matches the current SEQ and ACK but increases
+	 *    the window.  This is the escape from persist mode, if there
+	 *    data to be sent.
+	 *
+	 * XXXAO: The presence of new SACK information would allow to
+	 * accept window updates during retransmits.  We don't have an
+	 * easy way to test for that the moment.
+	 *
+	 * NB: The other side isn't allowed to shrink the window when
+	 * not sending or acking new data.  This behavior is strongly
+	 * discouraged by RFC793, section 3.7, page 42 anyways.
+	 *
+	 * XXXAO: tiwin >= minmss to avoid jitter?
 	 */
-	if ((thflags & TH_ACK) &&
-	    (SEQ_LT(tp->snd_wl1, th->th_seq) ||
-	    (tp->snd_wl1 == th->th_seq && (SEQ_LT(tp->snd_wl2, th->th_ack) ||
-	     (tp->snd_wl2 == th->th_ack && tiwin > tp->snd_wnd))))) {
-		/* keep track of pure window updates */
-		if (tlen == 0 &&
-		    tp->snd_wl2 == th->th_ack && tiwin > tp->snd_wnd)
+        if ((thflags & TH_ACK) && tiwin != tp->snd_wnd &&
+	    (SEQ_GT(th->th_ack, tp->snd_wl2) ||
+	     (th->th_ack == tp->snd_wl2 &&
+	      (SEQ_GT(th->th_seq + tlen, tp->snd_wl1) ||
+	      (th->th_seq == tp->snd_wl1 && tlen == 0 && tiwin > tp->snd_wnd))))) {
+#if 0
+		char *s;
+		if ((s = tcp_log_addrs(&tp->t_inpcb->inp_inc, th, NULL, NULL))) {
+			log(LOG_DEBUG, "%s; %s: window update %lu -> %lu\n",
+			    s, __func__, tp->snd_wnd, tiwin);
+			free(s, M_TCPLOG);
+		}
+#endif
+		/* Keep track of pure window updates. */
+		if (th->th_seq == tp->snd_wl1 && tlen == 0 &&
+		    tiwin > tp->snd_wnd)
 			TCPSTAT_INC(tcps_rcvwinupd);
+		/*
+		 * When the new window is larger, nudge output
+		 * as we may be able to send more data.
+		 */
+		if (tiwin > tp->snd_wnd)
+			needoutput = 1;
 		tp->snd_wnd = tiwin;
-		tp->snd_wl1 = th->th_seq;
-		tp->snd_wl2 = th->th_ack;
 		if (tp->snd_wnd > tp->max_sndwnd)
 			tp->max_sndwnd = tp->snd_wnd;
-		needoutput = 1;
 	}
+	if (SEQ_GT(th->th_ack, tp->snd_wl2))
+		tp->snd_wl2 = th->th_ack;
 
 	/*
 	 * Process segments with URG.
@@ -2870,6 +2899,8 @@ dodata:							/* XXX */
 			thflags = tcp_reass(tp, th, &tlen, m);
 			tp->t_flags |= TF_ACKNOW;
 		}
+		if (SEQ_GT(th->th_seq, tp->snd_wl1))
+			tp->snd_wl1 = th->th_seq + tlen;
 		if (tlen > 0 && (tp->t_flags & TF_SACK_PERMIT))
 			tcp_update_sack_list(tp, save_start, save_start + tlen);
 #if 0



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201210281916.q9SJGNwb080550>