From owner-svn-src-all@freebsd.org Fri Aug 26 06:19:13 2016 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AC981BC6B8B; Fri, 26 Aug 2016 06:19:13 +0000 (UTC) (envelope-from sephe@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6F7661FAE; Fri, 26 Aug 2016 06:19:13 +0000 (UTC) (envelope-from sephe@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id u7Q6JCh3064046; Fri, 26 Aug 2016 06:19:12 GMT (envelope-from sephe@FreeBSD.org) Received: (from sephe@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id u7Q6JCWs064045; Fri, 26 Aug 2016 06:19:12 GMT (envelope-from sephe@FreeBSD.org) Message-Id: <201608260619.u7Q6JCWs064045@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: sephe set sender to sephe@FreeBSD.org using -f From: Sepherosa Ziehau Date: Fri, 26 Aug 2016 06:19:12 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-10@freebsd.org Subject: svn commit: r304836 - stable/10/sys/netinet X-SVN-Group: stable-10 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Aug 2016 06:19:13 -0000 Author: sephe Date: Fri Aug 26 06:19:12 2016 New Revision: 304836 URL: https://svnweb.freebsd.org/changeset/base/304836 Log: MFC 303766 tcp/lro: If timestamps mismatch or it's a FIN, force flush. This keeps the segments/ACK/FIN delivery order. Before this patch, it was observed: if A sent FIN immediately after an ACK, B would deliver FIN first to the TCP stack, then the ACK. This out-of-order delivery causes one unnecessary ACK sent from B. Reviewed by: gallatin, hps Obtained from: rrs, gallatin Sponsored by: Netflix (rrs, gallatin), Microsoft (sephe) Differential Revision: https://reviews.freebsd.org/D7415 Modified: stable/10/sys/netinet/tcp_lro.c Directory Properties: stable/10/ (props changed) Modified: stable/10/sys/netinet/tcp_lro.c ============================================================================== --- stable/10/sys/netinet/tcp_lro.c Fri Aug 26 05:37:44 2016 (r304835) +++ stable/10/sys/netinet/tcp_lro.c Fri Aug 26 06:19:12 2016 (r304836) @@ -382,6 +382,7 @@ tcp_lro_rx(struct lro_ctrl *lc, struct m tcp_seq seq; int error, ip_len, l; uint16_t eh_type, tcp_data_len; + int force_flush = 0; /* We expect a contiguous header [eh, ip, tcp]. */ @@ -448,8 +449,15 @@ tcp_lro_rx(struct lro_ctrl *lc, struct m * Check TCP header constraints. */ /* Ensure no bits set besides ACK or PSH. */ - if ((th->th_flags & ~(TH_ACK | TH_PUSH)) != 0) - return (TCP_LRO_CANNOT); + if ((th->th_flags & ~(TH_ACK | TH_PUSH)) != 0) { + if (th->th_flags & TH_SYN) + return (TCP_LRO_CANNOT); + /* + * Make sure that previously seen segements/ACKs are delivered + * before this segement, e.g. FIN. + */ + force_flush = 1; + } /* XXX-BZ We lose a ACK|PUSH flag concatenating multiple segments. */ /* XXX-BZ Ideally we'd flush on PUSH? */ @@ -465,8 +473,13 @@ tcp_lro_rx(struct lro_ctrl *lc, struct m ts_ptr = (uint32_t *)(th + 1); if (l != 0 && (__predict_false(l != TCPOLEN_TSTAMP_APPA) || (*ts_ptr != ntohl(TCPOPT_NOP<<24|TCPOPT_NOP<<16| - TCPOPT_TIMESTAMP<<8|TCPOLEN_TIMESTAMP)))) - return (TCP_LRO_CANNOT); + TCPOPT_TIMESTAMP<<8|TCPOLEN_TIMESTAMP)))) { + /* + * Make sure that previously seen segements/ACKs are delivered + * before this segement. + */ + force_flush = 1; + } /* If the driver did not pass in the checksum, set it now. */ if (csum == 0x0000) @@ -500,6 +513,13 @@ tcp_lro_rx(struct lro_ctrl *lc, struct m #endif } + if (force_flush) { + /* Timestamps mismatch; this is a FIN, etc */ + SLIST_REMOVE(&lc->lro_active, le, lro_entry, next); + tcp_lro_flush(lc, le); + return (TCP_LRO_CANNOT); + } + /* Flush now if appending will result in overflow. */ if (le->p_len > (65535 - tcp_data_len)) { SLIST_REMOVE(&lc->lro_active, le, lro_entry, next); @@ -568,6 +588,14 @@ tcp_lro_rx(struct lro_ctrl *lc, struct m return (0); } + if (force_flush) { + /* + * Nothing to flush, but this segment can not be further + * aggregated/delayed. + */ + return (TCP_LRO_CANNOT); + } + /* Try to find an empty slot. */ if (SLIST_EMPTY(&lc->lro_free)) return (TCP_LRO_NO_ENTRIES);