From owner-svn-src-head@FreeBSD.ORG Sun Oct 28 19:16:23 2012 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 810A6601; Sun, 28 Oct 2012 19:16:23 +0000 (UTC) (envelope-from andre@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c]) by mx1.freebsd.org (Postfix) with ESMTP id 59D108FC14; Sun, 28 Oct 2012 19:16:23 +0000 (UTC) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.4/8.14.4) with ESMTP id q9SJGNQ7080552; Sun, 28 Oct 2012 19:16:23 GMT (envelope-from andre@svn.freebsd.org) Received: (from andre@localhost) by svn.freebsd.org (8.14.4/8.14.4/Submit) id q9SJGNwb080550; Sun, 28 Oct 2012 19:16:23 GMT (envelope-from andre@svn.freebsd.org) Message-Id: <201210281916.q9SJGNwb080550@svn.freebsd.org> From: Andre Oppermann Date: Sun, 28 Oct 2012 19:16:23 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r242262 - head/sys/netinet X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Oct 2012 19:16:23 -0000 Author: andre Date: Sun Oct 28 19:16:22 2012 New Revision: 242262 URL: http://svn.freebsd.org/changeset/base/242262 Log: Simplify and enhance the window change/update acceptance logic, especially in the presence of bi-directional data transfers. snd_wl1 tracks the right edge, including data in the reassembly queue, of valid incoming data. This makes it like rcv_nxt plus reassembly. It never goes backwards to prevent older, possibly reordered segments from updating the window. snd_wl2 tracks the left edge of sent data. This makes it a duplicate of snd_una. However joining them right now is difficult due to separate update dependencies in different places in the code flow. snd_wnd tracks the current advertized send window by the peer. In tcp_output() the effective window is calculated by subtracting the already in-flight data, snd_nxt less snd_una, from it. ACK's become the main clock of window updates and will always update the window when the left edge of what we sent is advanced. The ACK clock is the primary signaling mechanism in ongoing data transfers. This works reliably even in the presence of reordering, reassembly and retransmitted segments. The ACK clock is most important because it determines how much data we are allowed to inject into the network. Zero window updates get us out of persistence mode are crucial. Here a segment that neither moves ACK nor SEQ but enlarges WND is accepted. When the ACK clock is not active (that is we're not or no longer sending any data) any segment that moves the extended right SEQ edge, including out-of-order segments, updates the window. This gives us updates especially during ping-pong transfers where the peer isn't done consuming the already acknowledged data from the receive buffer while responding with data. The SSH protocol is a prime candidate to benefit from the improved bi-directional window update logic as it has its own windowing mechanism on top of TCP and is frequently sending back protocol ACK's. Tcpdump provided by: darrenr Tested by: darrenr MFC after: 2 weeks Modified: head/sys/netinet/tcp_input.c Modified: head/sys/netinet/tcp_input.c ============================================================================== --- head/sys/netinet/tcp_input.c Sun Oct 28 19:02:07 2012 (r242261) +++ head/sys/netinet/tcp_input.c Sun Oct 28 19:16:22 2012 (r242262) @@ -1714,7 +1714,7 @@ tcp_do_segment(struct mbuf *m, struct tc * Pull snd_wl1 up to prevent seq wrap relative to * th_seq. */ - tp->snd_wl1 = th->th_seq; + tp->snd_wl1 = th->th_seq + tlen; /* * Pull rcv_up up to prevent seq wrap relative to * rcv_nxt. @@ -2327,7 +2327,6 @@ tcp_do_segment(struct mbuf *m, struct tc if (tlen == 0 && (thflags & TH_FIN) == 0) (void) tcp_reass(tp, (struct tcphdr *)0, 0, (struct mbuf *)0); - tp->snd_wl1 = th->th_seq - 1; /* FALLTHROUGH */ /* @@ -2638,12 +2637,10 @@ process_ACK: SOCKBUF_LOCK(&so->so_snd); if (acked > so->so_snd.sb_cc) { - tp->snd_wnd -= so->so_snd.sb_cc; sbdrop_locked(&so->so_snd, (int)so->so_snd.sb_cc); ourfinisacked = 1; } else { sbdrop_locked(&so->so_snd, acked); - tp->snd_wnd -= acked; ourfinisacked = 0; } /* NB: sowwakeup_locked() does an implicit unlock. */ @@ -2733,24 +2730,56 @@ step6: INP_WLOCK_ASSERT(tp->t_inpcb); /* - * Update window information. - * Don't look at window if no ACK: TAC's send garbage on first SYN. + * Window update acceptance logic. We have to be careful not + * to accept window updates from old segments in the presence + * of reordering or duplication. + * + * A window update is valid when: + * - the segment ACK's new data. + * - the segment carries new data and its ACK is current. + * - the segment matches the current SEQ and ACK but increases + * the window. This is the escape from persist mode, if there + * data to be sent. + * + * XXXAO: The presence of new SACK information would allow to + * accept window updates during retransmits. We don't have an + * easy way to test for that the moment. + * + * NB: The other side isn't allowed to shrink the window when + * not sending or acking new data. This behavior is strongly + * discouraged by RFC793, section 3.7, page 42 anyways. + * + * XXXAO: tiwin >= minmss to avoid jitter? */ - if ((thflags & TH_ACK) && - (SEQ_LT(tp->snd_wl1, th->th_seq) || - (tp->snd_wl1 == th->th_seq && (SEQ_LT(tp->snd_wl2, th->th_ack) || - (tp->snd_wl2 == th->th_ack && tiwin > tp->snd_wnd))))) { - /* keep track of pure window updates */ - if (tlen == 0 && - tp->snd_wl2 == th->th_ack && tiwin > tp->snd_wnd) + if ((thflags & TH_ACK) && tiwin != tp->snd_wnd && + (SEQ_GT(th->th_ack, tp->snd_wl2) || + (th->th_ack == tp->snd_wl2 && + (SEQ_GT(th->th_seq + tlen, tp->snd_wl1) || + (th->th_seq == tp->snd_wl1 && tlen == 0 && tiwin > tp->snd_wnd))))) { +#if 0 + char *s; + if ((s = tcp_log_addrs(&tp->t_inpcb->inp_inc, th, NULL, NULL))) { + log(LOG_DEBUG, "%s; %s: window update %lu -> %lu\n", + s, __func__, tp->snd_wnd, tiwin); + free(s, M_TCPLOG); + } +#endif + /* Keep track of pure window updates. */ + if (th->th_seq == tp->snd_wl1 && tlen == 0 && + tiwin > tp->snd_wnd) TCPSTAT_INC(tcps_rcvwinupd); + /* + * When the new window is larger, nudge output + * as we may be able to send more data. + */ + if (tiwin > tp->snd_wnd) + needoutput = 1; tp->snd_wnd = tiwin; - tp->snd_wl1 = th->th_seq; - tp->snd_wl2 = th->th_ack; if (tp->snd_wnd > tp->max_sndwnd) tp->max_sndwnd = tp->snd_wnd; - needoutput = 1; } + if (SEQ_GT(th->th_ack, tp->snd_wl2)) + tp->snd_wl2 = th->th_ack; /* * Process segments with URG. @@ -2870,6 +2899,8 @@ dodata: /* XXX */ thflags = tcp_reass(tp, th, &tlen, m); tp->t_flags |= TF_ACKNOW; } + if (SEQ_GT(th->th_seq, tp->snd_wl1)) + tp->snd_wl1 = th->th_seq + tlen; if (tlen > 0 && (tp->t_flags & TF_SACK_PERMIT)) tcp_update_sack_list(tp, save_start, save_start + tlen); #if 0