From owner-freebsd-net@FreeBSD.ORG Wed Mar 19 03:28:40 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0CB9DFA9 for ; Wed, 19 Mar 2014 03:28:40 +0000 (UTC) Received: from vps.hungerhost.com (vps.hungerhost.com [216.38.53.176]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 7222F75B for ; Wed, 19 Mar 2014 03:28:39 +0000 (UTC) Received: from pool-96-250-5-187.nycmny.fios.verizon.net ([96.250.5.187]:64430 helo=minion.home) by vps.hungerhost.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80.1) (envelope-from ) id 1WQ7BD-0003ve-7H; Tue, 18 Mar 2014 23:28:36 -0400 Content-Type: multipart/signed; boundary="Apple-Mail=_77A09F73-CDA1-495E-9871-B5F8C833DB81"; protocol="application/pgp-signature"; micalg=pgp-sha1 Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\)) Subject: Re: DCTCP for FreeBSD From: George Neville-Neil In-Reply-To: Date: Tue, 18 Mar 2014 23:28:33 -0400 X-Mao-Original-Outgoing-Id: 416892514.288427-5a74c5f5436bd212f0ff7f4c7430cefe Message-Id: <273DE766-0AAA-4DB3-A3EA-1CFBDE0DBB4F@neville-neil.com> References: To: "Eggert, Lars" X-Mailer: Apple Mail (2.1874) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - neville-neil.com X-Get-Message-Sender-Via: vps.hungerhost.com: authenticated_id: gnn@neville-neil.com Cc: "freebsd-net@freebsd.org" , Midori Kato X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Mar 2014 03:28:40 -0000 --Apple-Mail=_77A09F73-CDA1-495E-9871-B5F8C833DB81 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On Feb 19, 2014, at 4:18 , Eggert, Lars wrote: > Hi, >=20 > Midori Kato has implemented Microsoft's/Stanford's Datacenter TCP = (DCTCP) for FreeBSD as part of her MS thesis with me. Find a patch = attached. >=20 Thanks! Any hints on how best to test this code? Best, George > Also note that we're documenting a specification for DCTCP in an IETF = draft: http://tools.ietf.org/html/draft-bensley-tcpm-dctcp >=20 > Microsoft has made a licensing statement (RAND-Z) on the technology to = the IETF: https://datatracker.ietf.org/ipr/2319/ (I'm not sure what this = means for an eventual inclusion in FreeBSD.) >=20 > Roughly, Midori's patch consists of an extension of the modular = congestion control framework to expose ECN information to the modules, a = module to implement DCTCP, and a few experimental variants. See Midori's = explanation: >=20 >> [1] A change for the modular congestion control framework (See = Section 4.1 if needed) >> DCTCP uses the difference ECN processing from RFC3168. We need to = prepare three functions to do the following ECN processing.=20 >> a) The kernel decides whether an ECE flag should be set in the next = outgoing TCP segment by snooping reserved bits in IP and TCP headers. = (tcp_input.c) >> b) The kernel controls a congestion if an ECE flag is set in an = arriving TCP segment. (tcp_input.c) >> c) After the outgoing TCP segment is generated, the kernel decides = whether an ECT bit should be set in an ECN field of IP header in the = outgoing packet. (tcp_output.c) >> The current framework has no housekeeping functions for (a) and (b). = Therefore, I add two functions into the moduler cc framework: = ecnpkt_handler() and ect_handler(). >>=20 >> - ecnpkt_handler() allows the kernel to do the additional ECN = processing by snooping ECN field in IP and TCP headers. As an option, = this function takes a flag, which tells whether this function is in the = delayed ACK. This function returns an integer value. When the return = value is set, the kernel force to disable delayed ACK. >> - ect_handler() allows the kernel to use different rule from RFC3168 = in terms of an ECT marking in the outgoing segment. This function = returns an integer value. If the value is set, an ECT bit is set to the = outgoing segment. >>=20 >>=20 >> [2] Five changes from the original DCTCP algorithm >> In order to reflect the DCTCP motivation, I modified the following = processing. First four modifications are for senders and the last = modification is for receivers. >>=20 >> (1) no congestion recovery in the receipt of ECE flags (See section = 4.2.1 if needed) >> FreeBSD handles ECN as a congestion event but it's not true for DCTCP = senders. A DCTCP sender uses ECN as a means to understand the extent of = congestions. Therefore, I remove congestion recovery mode in any = situation for DCTCP senders. >>=20 >> (2) selective initial alpha value (See section 4.2.2 if needed)=20 >> DCTCP defines alpha as a parameter to see the depth of a congestion. = When the alpha value is large, it allows a saw-toothed CWND behavior to = a DCTCP sender. >> A problem is that the alpha value is not reliable during a dozen of = RTTs because there is no way to identify the depth of a congestion over = a network from the beginning. When considering the alpha reliability, I = think the initial alpha should be selective for applications by users. = When a user chooses DCTCP for latency-sensitive applications, the = initial alpha is preferred. Otherwise, DCTCP senders had better to set = the initial alpha value to zero from my experimental result (See section = 7.2 of attaching file). >> The default alpha value is set to zero in my implementation. >>=20 >> (3) alpha value initialization after an idle period (See section = 4.2.3 if needed) >> How long an idle period is no longer predictable. Therefore, for a = DCTCP sender, using the out-dated alpha after an idle period is not good = idea. A DCTCP sender resets alpha to the initial value when an idle = period occurs. >>=20 >> The following changes is applied to eliminate a compatibility issue = to standard ECN defined in RFC3465. DCTCP and standard ECN servers have = no way to identify which mechanism is working on the peer. Thus, we need = to eliminate the worst situation in a network mixing DCTCP = senders/receivers and standard ECN senders/receivers. >> (4) using CWR flag when the ECE flag is found for a RTT (See section = 5.1 if needed) >> This change is applied for a situation when a sender uses DCTCP and a = reciever uses standard ECN.=20 >> Under the situation, I find that a DCTCP sender minimizes CWND. The = detailed technical reason is described in section 4.2 of my attaching = file. Fortunately, the current tcp_input() function complement this = change, thus, there is no modification in my patch. >>=20 >> (5) enabling delayed ACK in the receipt of the CWR flag (See section = 5.2 if needed) >> This change is applied for a situation when a sender uses standard = ECN and a reciever uses DCTCP. Under the situation, I find that a = standard ECN sender increases smaller CWND than expected without this = change. The detailed technical reason is described in section 5.2 of my = attaching file. >=20 >=20 > The patch is attached and should apply to a recent -CURRENT. Midori's = thesis (which she refers to in the quoted text above) is at = https://eggert.org/students/kato-thesis.pdf >=20 > Lars >=20 > --Apple-Mail=_77A09F73-CDA1-495E-9871-B5F8C833DB81 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org iEYEARECAAYFAlMpDuIACgkQYdh2wUQKM9KudgCgqLHf+KnuHBnGbH/YNLSd543X FoMAnRW+zY7r8L0tTQFxlBzusREn5U2O =ln+h -----END PGP SIGNATURE----- --Apple-Mail=_77A09F73-CDA1-495E-9871-B5F8C833DB81--