From owner-freebsd-net@FreeBSD.ORG Tue May 31 09:10:23 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7659E106564A for ; Tue, 31 May 2011 09:10:23 +0000 (UTC) (envelope-from rs@netapp.com) Received: from mx4.netapp.com (mx4.netapp.com [217.70.210.8]) by mx1.freebsd.org (Postfix) with ESMTP id AC8538FC12 for ; Tue, 31 May 2011 09:10:22 +0000 (UTC) X-IronPort-AV: E=Sophos;i="4.65,296,1304319600"; d="scan'208,217";a="251130777" Received: from smtp3.europe.netapp.com ([10.64.2.67]) by mx4-out.netapp.com with ESMTP; 31 May 2011 01:40:41 -0700 Received: from ldcrsexc1-prd.hq.netapp.com (ldcrsexc1-prd.hq.netapp.com [10.65.251.109]) by smtp3.europe.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id p4V8eM3p013807 for ; Tue, 31 May 2011 01:40:41 -0700 (PDT) Received: from LDCMVEXC1-PRD.hq.netapp.com ([10.65.251.107]) by ldcrsexc1-prd.hq.netapp.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 31 May 2011 09:40:33 +0100 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Date: Tue, 31 May 2011 09:39:51 +0100 Message-ID: <5FDC413D5FA246468C200652D63E627A0E99C887@LDCMVEXC1-PRD.hq.netapp.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Re: [CFT] Early Retransmit for TCP (rfc5827) patch Thread-Index: Acwfbk7BR2PjavRGS3S3MLUeTQ1OGg== From: "Scheffenegger, Richard" To: X-OriginalArrivalTime: 31 May 2011 08:40:33.0930 (UTC) FILETIME=[680ABEA0:01CC1F6E] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: [CFT] Early Retransmit for TCP (rfc5827) patch X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 May 2011 09:10:23 -0000 Hi Weongyo, Good to know that you are addressing the primary reason for retransmission timeouts with SACK. (Small window (early retransmit) is ~70%, lost retransmission ~25%, end-of-stream loss ~5% of all addressable causes for a RTO). I looked at your code to enable RFC5827 Early Retransmits. There is one minor nit-pick: tcp_input is calling tcp_getrexmtthresh for every duplicate ACK. When SACK is enabled (over 90% of all sessions today), the byte-based tcp_sack_ownd routine cycles over the entire SACK scoreboard. As the scoreboard can become huge with fat, long pipes, this appears to be suboptimal.=20 Perhaps something along these lines: ackedbyte =3D 0; int mark =3D tp->snd_una; TAILQ_FOREACH(p, &tp->snd_holes, scblink) { ackedbyte +=3D p->start - mark; if (ackedbyte >=3D amout) return(TRUE); mark =3D p->end; } ackedbyte +=3D tp->snd_fack - mark; if (ackedbyte >=3D amout) return(TRUE); return(FALSE); Would be more scalable (only a holes at the start need to be cycled, increasing the chances that they stick close to the CPU)... Perhaps adding a variable to track the number of bytes SACKed to the scoreboard (and updated with the receipt of a new SACK block) would be even more efficient.... Best regards, Richard Scheffenegger From: weongyo@freebsd.org Date: Sat May 7 00:19:38 UTC 2011 Hello all, I'd like to send another patch to support RFC5827 in TCP stack which could be found at: http://people.freebsd.org/~weongyo/patch_20110506_rfc5827.diff =20 This patch supports all Early Retransmit logics (Byte-Based Early Retransmit and Segment-Based Early Retransmit) when net.inet.tcp.rfc5827 sysctl knob is turned on. Please note that Segment-Based Early Retransmit logic is separated using khelp module because it adds additional operations and requires variable spaces to track segment boundaries on the right side window. So if the khelp module is loaded, it's a preference but if not the default logic is `Byte-Based Early Retransmit'. I implemented based on DragonflyBSD's implementation but it looked it's not same with RFC specification what I thought so I changed most of parts. In my test environments it looks it's working correctly. Please review and test my work and tell me if you have any concerns and questions. regards, Weongyo Jeong -------------- next part -------------- A non-text attachment was scrubbed... Name: patch_20110506_rfc5827.diff Type: text/x-diff Size: 18455 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20110507/90f2 f164/patch_20110506_rfc5827.bin