Date: Tue, 18 Mar 2014 23:28:33 -0400 From: George Neville-Neil <gnn@neville-neil.com> To: "Eggert, Lars" <lars@netapp.com> Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, Midori Kato <katoon@sfc.wide.ad.jp> Subject: Re: DCTCP for FreeBSD Message-ID: <273DE766-0AAA-4DB3-A3EA-1CFBDE0DBB4F@neville-neil.com> In-Reply-To: <BE8726B3-7AC9-4CB8-8D12-E05F54AB59AB@netapp.com> References: <BE8726B3-7AC9-4CB8-8D12-E05F54AB59AB@netapp.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--Apple-Mail=_77A09F73-CDA1-495E-9871-B5F8C833DB81 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On Feb 19, 2014, at 4:18 , Eggert, Lars <lars@netapp.com> wrote: > Hi, >=20 > Midori Kato has implemented Microsoft's/Stanford's Datacenter TCP = (DCTCP) for FreeBSD as part of her MS thesis with me. Find a patch = attached. >=20 Thanks! Any hints on how best to test this code? Best, George > Also note that we're documenting a specification for DCTCP in an IETF = draft: http://tools.ietf.org/html/draft-bensley-tcpm-dctcp >=20 > Microsoft has made a licensing statement (RAND-Z) on the technology to = the IETF: https://datatracker.ietf.org/ipr/2319/ (I'm not sure what this = means for an eventual inclusion in FreeBSD.) >=20 > Roughly, Midori's patch consists of an extension of the modular = congestion control framework to expose ECN information to the modules, a = module to implement DCTCP, and a few experimental variants. See Midori's = explanation: >=20 >> [1] A change for the modular congestion control framework (See = Section 4.1 if needed) >> DCTCP uses the difference ECN processing from RFC3168. We need to = prepare three functions to do the following ECN processing.=20 >> a) The kernel decides whether an ECE flag should be set in the next = outgoing TCP segment by snooping reserved bits in IP and TCP headers. = (tcp_input.c) >> b) The kernel controls a congestion if an ECE flag is set in an = arriving TCP segment. (tcp_input.c) >> c) After the outgoing TCP segment is generated, the kernel decides = whether an ECT bit should be set in an ECN field of IP header in the = outgoing packet. (tcp_output.c) >> The current framework has no housekeeping functions for (a) and (b). = Therefore, I add two functions into the moduler cc framework: = ecnpkt_handler() and ect_handler(). >>=20 >> - ecnpkt_handler() allows the kernel to do the additional ECN = processing by snooping ECN field in IP and TCP headers. As an option, = this function takes a flag, which tells whether this function is in the = delayed ACK. This function returns an integer value. When the return = value is set, the kernel force to disable delayed ACK. >> - ect_handler() allows the kernel to use different rule from RFC3168 = in terms of an ECT marking in the outgoing segment. This function = returns an integer value. If the value is set, an ECT bit is set to the = outgoing segment. >>=20 >>=20 >> [2] Five changes from the original DCTCP algorithm >> In order to reflect the DCTCP motivation, I modified the following = processing. First four modifications are for senders and the last = modification is for receivers. >>=20 >> (1) no congestion recovery in the receipt of ECE flags (See section = 4.2.1 if needed) >> FreeBSD handles ECN as a congestion event but it's not true for DCTCP = senders. A DCTCP sender uses ECN as a means to understand the extent of = congestions. Therefore, I remove congestion recovery mode in any = situation for DCTCP senders. >>=20 >> (2) selective initial alpha value (See section 4.2.2 if needed)=20 >> DCTCP defines alpha as a parameter to see the depth of a congestion. = When the alpha value is large, it allows a saw-toothed CWND behavior to = a DCTCP sender. >> A problem is that the alpha value is not reliable during a dozen of = RTTs because there is no way to identify the depth of a congestion over = a network from the beginning. When considering the alpha reliability, I = think the initial alpha should be selective for applications by users. = When a user chooses DCTCP for latency-sensitive applications, the = initial alpha is preferred. Otherwise, DCTCP senders had better to set = the initial alpha value to zero from my experimental result (See section = 7.2 of attaching file). >> The default alpha value is set to zero in my implementation. >>=20 >> (3) alpha value initialization after an idle period (See section = 4.2.3 if needed) >> How long an idle period is no longer predictable. Therefore, for a = DCTCP sender, using the out-dated alpha after an idle period is not good = idea. A DCTCP sender resets alpha to the initial value when an idle = period occurs. >>=20 >> The following changes is applied to eliminate a compatibility issue = to standard ECN defined in RFC3465. DCTCP and standard ECN servers have = no way to identify which mechanism is working on the peer. Thus, we need = to eliminate the worst situation in a network mixing DCTCP = senders/receivers and standard ECN senders/receivers. >> (4) using CWR flag when the ECE flag is found for a RTT (See section = 5.1 if needed) >> This change is applied for a situation when a sender uses DCTCP and a = reciever uses standard ECN.=20 >> Under the situation, I find that a DCTCP sender minimizes CWND. The = detailed technical reason is described in section 4.2 of my attaching = file. Fortunately, the current tcp_input() function complement this = change, thus, there is no modification in my patch. >>=20 >> (5) enabling delayed ACK in the receipt of the CWR flag (See section = 5.2 if needed) >> This change is applied for a situation when a sender uses standard = ECN and a reciever uses DCTCP. Under the situation, I find that a = standard ECN sender increases smaller CWND than expected without this = change. The detailed technical reason is described in section 5.2 of my = attaching file. >=20 >=20 > The patch is attached and should apply to a recent -CURRENT. Midori's = thesis (which she refers to in the quoted text above) is at = https://eggert.org/students/kato-thesis.pdf >=20 > Lars >=20 > <dctcp.patch> --Apple-Mail=_77A09F73-CDA1-495E-9871-B5F8C833DB81 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org iEYEARECAAYFAlMpDuIACgkQYdh2wUQKM9KudgCgqLHf+KnuHBnGbH/YNLSd543X FoMAnRW+zY7r8L0tTQFxlBzusREn5U2O =ln+h -----END PGP SIGNATURE----- --Apple-Mail=_77A09F73-CDA1-495E-9871-B5F8C833DB81--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?273DE766-0AAA-4DB3-A3EA-1CFBDE0DBB4F>