Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Mar 2014 23:28:33 -0400
From:      George Neville-Neil <gnn@neville-neil.com>
To:        "Eggert, Lars" <lars@netapp.com>
Cc:        "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, Midori Kato <katoon@sfc.wide.ad.jp>
Subject:   Re: DCTCP for FreeBSD
Message-ID:  <273DE766-0AAA-4DB3-A3EA-1CFBDE0DBB4F@neville-neil.com>
In-Reply-To: <BE8726B3-7AC9-4CB8-8D12-E05F54AB59AB@netapp.com>
References:  <BE8726B3-7AC9-4CB8-8D12-E05F54AB59AB@netapp.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_77A09F73-CDA1-495E-9871-B5F8C833DB81
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii


On Feb 19, 2014, at 4:18 , Eggert, Lars <lars@netapp.com> wrote:

> Hi,
>=20
> Midori Kato has implemented Microsoft's/Stanford's Datacenter TCP =
(DCTCP) for FreeBSD as part of her MS thesis with me. Find a patch =
attached.
>=20

Thanks!  Any hints on how best to test this code?

Best,
George

> Also note that we're documenting a specification for DCTCP in an IETF =
draft: http://tools.ietf.org/html/draft-bensley-tcpm-dctcp
>=20
> Microsoft has made a licensing statement (RAND-Z) on the technology to =
the IETF: https://datatracker.ietf.org/ipr/2319/ (I'm not sure what this =
means for an eventual inclusion in FreeBSD.)
>=20
> Roughly, Midori's patch consists of an extension of the modular =
congestion control framework to expose ECN information to the modules, a =
module to implement DCTCP, and a few experimental variants. See Midori's =
explanation:
>=20
>> [1] A change for the modular congestion control framework (See =
Section 4.1 if needed)
>> DCTCP uses the difference ECN processing from RFC3168. We need to =
prepare three functions to do the following ECN processing.=20
>> a) The kernel decides whether an ECE flag should be set in the next =
outgoing TCP segment by snooping reserved bits in IP and TCP headers. =
(tcp_input.c)
>> b) The kernel controls a congestion if an ECE flag is set in an =
arriving TCP segment. (tcp_input.c)
>> c) After the outgoing TCP segment is generated, the kernel decides =
whether an ECT bit should be set in an ECN field of IP header in the =
outgoing packet. (tcp_output.c)
>> The current framework has no housekeeping functions for (a) and (b). =
Therefore, I add two functions into the moduler cc framework: =
ecnpkt_handler() and ect_handler().
>>=20
>> - ecnpkt_handler() allows the kernel to do the additional ECN =
processing by snooping ECN field in IP and TCP headers. As an option, =
this function takes a flag, which tells whether this function is in the =
delayed ACK. This function returns an integer value. When the return =
value is set, the kernel force to disable delayed ACK.
>> - ect_handler() allows the kernel to use different rule from RFC3168 =
in terms of an ECT marking in the outgoing segment. This function =
returns an integer value. If the value is set, an ECT bit is set to the =
outgoing segment.
>>=20
>>=20
>> [2] Five changes from the original DCTCP algorithm
>> In order to reflect the DCTCP motivation, I modified the following =
processing. First four modifications are for senders and the last =
modification is for receivers.
>>=20
>> (1) no congestion recovery in the receipt of ECE flags (See section =
4.2.1 if needed)
>> FreeBSD handles ECN as a congestion event but it's not true for DCTCP =
senders. A DCTCP sender uses ECN as a means to understand the extent of =
congestions. Therefore, I remove congestion recovery mode in any =
situation for DCTCP senders.
>>=20
>> (2) selective initial alpha value (See section 4.2.2 if needed)=20
>> DCTCP defines alpha as a parameter to see the depth of a congestion. =
When the alpha value is large, it allows a saw-toothed CWND behavior to =
a DCTCP sender.
>> A problem is that the alpha value is not reliable during a dozen of =
RTTs because there is no way to identify the depth of a congestion over =
a network from the beginning. When considering the alpha reliability, I =
think the initial alpha should be selective for applications by users. =
When a user chooses DCTCP for latency-sensitive applications, the =
initial alpha is preferred. Otherwise, DCTCP senders had better to set =
the initial alpha value to zero from my experimental result (See section =
7.2 of attaching file).
>> The default alpha value is set to zero in my implementation.
>>=20
>> (3) alpha value initialization after an idle period (See section =
4.2.3 if needed)
>> How long an idle period is no longer predictable. Therefore, for a =
DCTCP sender, using the out-dated alpha after an idle period is not good =
idea. A DCTCP sender resets alpha to the initial value when an idle =
period occurs.
>>=20
>> The following changes is applied to eliminate a compatibility issue =
to standard ECN defined in RFC3465. DCTCP and standard ECN servers have =
no way to identify which mechanism is working on the peer. Thus, we need =
to eliminate the worst situation in a network mixing DCTCP =
senders/receivers and standard ECN senders/receivers.
>> (4) using CWR flag when the ECE flag is found for a RTT (See section =
5.1 if needed)
>> This change is applied for a situation when a sender uses DCTCP and a =
reciever uses standard ECN.=20
>> Under the situation, I find that a DCTCP sender minimizes CWND. The =
detailed technical reason is described in section 4.2 of my attaching =
file. Fortunately, the current tcp_input()  function complement this =
change, thus, there is no modification in my patch.
>>=20
>> (5) enabling delayed ACK in the receipt of the CWR flag (See section =
5.2 if needed)
>> This change is applied for a situation when a sender uses standard =
ECN and a reciever uses DCTCP. Under the situation, I find that a =
standard ECN sender increases smaller CWND than expected without this =
change. The detailed technical reason is described in section 5.2 of my =
attaching file.
>=20
>=20
> The patch is attached and should apply to a recent -CURRENT. Midori's =
thesis (which she refers to in the quoted text above) is at =
https://eggert.org/students/kato-thesis.pdf
>=20
> Lars
>=20
> <dctcp.patch>


--Apple-Mail=_77A09F73-CDA1-495E-9871-B5F8C833DB81
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org

iEYEARECAAYFAlMpDuIACgkQYdh2wUQKM9KudgCgqLHf+KnuHBnGbH/YNLSd543X
FoMAnRW+zY7r8L0tTQFxlBzusREn5U2O
=ln+h
-----END PGP SIGNATURE-----

--Apple-Mail=_77A09F73-CDA1-495E-9871-B5F8C833DB81--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?273DE766-0AAA-4DB3-A3EA-1CFBDE0DBB4F>