From owner-svn-src-stable@FreeBSD.ORG Tue Oct 29 21:00:55 2013 Return-Path: Delivered-To: svn-src-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 0E2222CE; Tue, 29 Oct 2013 21:00:55 +0000 (UTC) (envelope-from andre@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id D49A4270B; Tue, 29 Oct 2013 21:00:54 +0000 (UTC) Received: from svn.freebsd.org ([127.0.1.70]) by svn.freebsd.org (8.14.7/8.14.7) with ESMTP id r9TL0sRL034735; Tue, 29 Oct 2013 21:00:54 GMT (envelope-from andre@svn.freebsd.org) Received: (from andre@localhost) by svn.freebsd.org (8.14.7/8.14.5/Submit) id r9TL0sRR034734; Tue, 29 Oct 2013 21:00:54 GMT (envelope-from andre@svn.freebsd.org) Message-Id: <201310292100.r9TL0sRR034734@svn.freebsd.org> From: Andre Oppermann Date: Tue, 29 Oct 2013 21:00:54 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-10@freebsd.org Subject: svn commit: r257367 - stable/10/sys/netinet X-SVN-Group: stable-10 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SVN commit messages for all the -stable branches of the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Oct 2013 21:00:55 -0000 Author: andre Date: Tue Oct 29 21:00:54 2013 New Revision: 257367 URL: http://svnweb.freebsd.org/changeset/base/257367 Log: MFC r256920: The TCP delayed ACK logic isn't aware of LRO passing up large aggregated segments thinking it received only one segment. This causes it to enable the delay the ACK for 100ms to wait for another segment which may never come because all the data was received already. Doing delayed ACK for LRO segments is bogus for two reasons: a) it pushes us further away from acking every other packet; b) it introduces additional delay in responding to the sender. The latter is especially bad because it is in the nature of LRO to aggregated all segments of a burst with no more coming until an ACK is sent back. Change the delayed ACK logic to detect LRO segments by being larger than the MSS for this connection and issuing an immediate ACK for them to keep the ACK clock ticking without interruption. Reported by: julian, cperciva Tested by: cperciva Reviewed by: lstewart Approved by: re (glebius) Modified: stable/10/sys/netinet/tcp_input.c Directory Properties: stable/10/sys/ (props changed) Modified: stable/10/sys/netinet/tcp_input.c ============================================================================== --- stable/10/sys/netinet/tcp_input.c Tue Oct 29 20:53:09 2013 (r257366) +++ stable/10/sys/netinet/tcp_input.c Tue Oct 29 21:00:54 2013 (r257367) @@ -508,10 +508,13 @@ do { \ * the ack that opens up a 0-sized window and * - delayed acks are enabled or * - this is a half-synchronized T/TCP connection. + * - the segment size is not larger than the MSS and LRO wasn't used + * for this segment. */ -#define DELAY_ACK(tp) \ +#define DELAY_ACK(tp, tlen) \ ((!tcp_timer_active(tp, TT_DELACK) && \ (tp->t_flags & TF_RXWIN0SENT) == 0) && \ + (tlen <= tp->t_maxopd) && \ (V_tcp_delack_enabled || (tp->t_flags & TF_NEEDSYN))) /* @@ -1863,7 +1866,7 @@ tcp_do_segment(struct mbuf *m, struct tc } /* NB: sorwakeup_locked() does an implicit unlock. */ sorwakeup_locked(so); - if (DELAY_ACK(tp)) { + if (DELAY_ACK(tp, tlen)) { tp->t_flags |= TF_DELACK; } else { tp->t_flags |= TF_ACKNOW; @@ -1954,7 +1957,7 @@ tcp_do_segment(struct mbuf *m, struct tc * If there's data, delay ACK; if there's also a FIN * ACKNOW will be turned on later. */ - if (DELAY_ACK(tp) && tlen != 0) + if (DELAY_ACK(tp, tlen) && tlen != 0) tcp_timer_activate(tp, TT_DELACK, tcp_delacktime); else @@ -2926,7 +2929,7 @@ dodata: /* XXX */ if (th->th_seq == tp->rcv_nxt && LIST_EMPTY(&tp->t_segq) && TCPS_HAVEESTABLISHED(tp->t_state)) { - if (DELAY_ACK(tp)) + if (DELAY_ACK(tp, tlen)) tp->t_flags |= TF_DELACK; else tp->t_flags |= TF_ACKNOW;