From owner-freebsd-net@FreeBSD.ORG Mon Oct 21 20:11:49 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id E818F93 for ; Mon, 21 Oct 2013 20:11:49 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 597B222E4 for ; Mon, 21 Oct 2013 20:11:49 +0000 (UTC) Received: (qmail 43291 invoked from network); 21 Oct 2013 20:43:46 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 21 Oct 2013 20:43:46 -0000 Message-ID: <52658A79.6070705@freebsd.org> Date: Mon, 21 Oct 2013 22:11:37 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.0.1 MIME-Version: 1.0 To: Colin Percival , Julian Elischer , freebsd-net@freebsd.org Subject: Re: LRO causing stretch ACK violations interacts badly with delayed ACKing References: <52605EC9.6090406@freebsd.org> <526478D0.1000601@freebsd.org> <5264869E.4000308@freebsd.org> <5265450C.1060601@freebsd.org> <526558D2.3010505@freebsd.org> <52658726.4030106@freebsd.org> In-Reply-To: <52658726.4030106@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Oct 2013 20:11:50 -0000 On 21.10.2013 21:57, Andre Oppermann wrote: > On 21.10.2013 18:39, Colin Percival wrote: >> On 10/21/13 08:15, Andre Oppermann wrote: >>> On 21.10.2013 03:42, Colin Percival wrote: >>>> I can't find any changes in netfront.c or tcp_lro.c to explain why 9.1 and >>>> 9.2 are behaving differently -- anyone have any ideas? >>> >>> The last time I looked our soft-LRO had a few remaining issues. One of >>> them was that in certain situations reordering may happen with segments >>> that can't be aggregated into a LRO state. The other was that the driver >>> is responsible to manage the flushing of LRO states that haven't seen >>> updates in some time. >> >> It looks like the netfront driver flushes LRO every time it finishes reading >> packets -- if anything, it's too aggressive: >> /* >> * Flush any outstanding LRO work >> */ >> while (!SLIST_EMPTY(&lro->lro_active)) { >> queued = SLIST_FIRST(&lro->lro_active); >> SLIST_REMOVE_HEAD(&lro->lro_active, next); >> tcp_lro_flush(lro, queued); >> } >> >> So unless that code is broken somehow (it looks reasonable to me) I don't think >> it's a problem of data getting stuck in soft-LRO. >> >> Looking at the TCP stack on the other hand confuses me -- I see code which seems >> to be saying that we can delay-ACK any time that we're receiving data and don't >> have a delayed ACK already pending, without any regard for the fact that we >> might be receiving 2+ MSS at once... am I missing something here? > > This is an excellent observation! Our tcp doesn't know about LRO > and I prepared the mbuf header to carry information about the number > of merged LRO segments. That's not done yet again. However a small > heuristic in tcp_input looking for segment > mss should be sufficient > for now. Let me have a look at patching it into a suitable place. Please check out the patch below. Haven't tested it myself yet though. -- Andre Index: netinet/tcp_input.c =================================================================== --- netinet/tcp_input.c (revision 256780) +++ netinet/tcp_input.c (working copy) @@ -508,10 +508,13 @@ * the ack that opens up a 0-sized window and * - delayed acks are enabled or * - this is a half-synchronized T/TCP connection. + * - the segment size is not larger than the MSS and LRO wasn't used + * for this segment. */ -#define DELAY_ACK(tp) \ +#define DELAY_ACK(tp, tlen) \ ((!tcp_timer_active(tp, TT_DELACK) && \ (tp->t_flags & TF_RXWIN0SENT) == 0) && \ + (tlen <= tp->t_maxopd) && \ (V_tcp_delack_enabled || (tp->t_flags & TF_NEEDSYN))) /* @@ -1863,7 +1866,7 @@ } /* NB: sorwakeup_locked() does an implicit unlock. */ sorwakeup_locked(so); - if (DELAY_ACK(tp)) { + if (DELAY_ACK(tp, tlen)) { tp->t_flags |= TF_DELACK; } else { tp->t_flags |= TF_ACKNOW; @@ -1954,7 +1957,7 @@ * If there's data, delay ACK; if there's also a FIN * ACKNOW will be turned on later. */ - if (DELAY_ACK(tp) && tlen != 0) + if (DELAY_ACK(tp, tlen) && tlen != 0) tcp_timer_activate(tp, TT_DELACK, tcp_delacktime); else @@ -2926,7 +2929,7 @@ if (th->th_seq == tp->rcv_nxt && LIST_EMPTY(&tp->t_segq) && TCPS_HAVEESTABLISHED(tp->t_state)) { - if (DELAY_ACK(tp)) + if (DELAY_ACK(tp, tlen)) tp->t_flags |= TF_DELACK; else tp->t_flags |= TF_ACKNOW;