From owner-freebsd-transport@freebsd.org Thu Sep 12 19:41:24 2019 Return-Path: Delivered-To: freebsd-transport@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id B8793D8583 for ; Thu, 12 Sep 2019 19:41:24 +0000 (UTC) (envelope-from rrs@netflix.com) Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46Tpzm05wCz4g1h for ; Thu, 12 Sep 2019 19:41:23 +0000 (UTC) (envelope-from rrs@netflix.com) Received: by mail-qk1-x743.google.com with SMTP id q203so25712406qke.1 for ; Thu, 12 Sep 2019 12:41:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=T7pc5pNJbxsGxLBJaY6vGedIvZjtlHUzImI8o1AjF2Q=; b=RRueZe6MVLEfWVwRrME1JEeDswFoSTOabwYiEv+D0pZk4K6FpyrcAKhRVsBQXqRqJo Rzji2SKx9RHEAytbxkK3qHdoWaVnViIIm4QPSpw+vgqGIXZcdpdHM/+7EK5vLCpVsdvs AeXR/JfrQvpU9sEKYwYB5kxk7gqIQ8EjHdMiWxmEMkKsrjrk8sMD38DrAv3SGtrRSpGl ij26ahjtNPtO34x/BrPCU/8Z3YtBByJXS45VjtApd+Zsc7ZQ0yLYrZuyWi5p7Y3/isRh z32jM86UTFLBS+hicjS5+B9Fky8/6QRVB5m8spaH1tlfDgK1BZ+/5HIz+naQZi+1UrLG GcHQ== X-Gm-Message-State: APjAAAXJ//EBmFFUyLRLzxzASay4BQZxN2bsukXjYtn5LoGtQ+Xqqz9U p7MCQ8LM4qtUUDFBApdq6rD9+A== X-Google-Smtp-Source: APXvYqwCY8Zimx6XMYj1aXx0bmTYjrBZkPpRAEo4s3cmmRLb2unev+N2ap0boV3aKspHF+5cJnSd7Q== X-Received: by 2002:a37:a91:: with SMTP id 139mr41417496qkk.418.1568317282360; Thu, 12 Sep 2019 12:41:22 -0700 (PDT) Received: from ?IPv6:2607:fb10:7061:7fd::4fdb? ([2607:fb10:7061:7fd::4fdb]) by smtp.gmail.com with ESMTPSA id l3sm11492755qtc.33.2019.09.12.12.41.20 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 12 Sep 2019 12:41:21 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\)) Subject: Re: reno cwnd growth while app limited... From: Randall Stewart In-Reply-To: Date: Thu, 12 Sep 2019 15:41:19 -0400 Cc: Michael Tuexen , Lawrence Stewart , Jonathan Looney , "freebsd-transport@freebsd.org" , "Cui, Cheng" , Tom Jones , "bz@freebsd.org" , "Eggert, Lars" Content-Transfer-Encoding: quoted-printable Message-Id: <1E528E13-B087-4318-A9CA-F08957B3F03A@netflix.com> References: To: "Scheffenegger, Richard" X-Mailer: Apple Mail (2.3445.9.1) X-Rspamd-Queue-Id: 46Tpzm05wCz4g1h X-Spamd-Bar: ------------ X-Spamd-Result: default: False [-12.97 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; MV_CASE(0.50)[]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[netflix.com:+]; DMARC_POLICY_ALLOW(-0.50)[netflix.com,reject]; RCPT_COUNT_SEVEN(0.00)[9]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; IP_SCORE(-0.47)[ip: (2.68), ipnet: 2607:f8b0::/32(-2.71), asn: 15169(-2.25), country: US(-0.05)]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[netflix.com:s=google]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-transport@freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; WHITELIST_DMARC(-7.00)[netflix.com:D:+]; RCVD_IN_DNSWL_NONE(0.00)[3.4.7.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org : 127.0.5.0]; WHITELIST_SPF_DKIM(-3.00)[netflix.com:d:+,netflix.com:s:+]; RCVD_TLS_ALL(0.00)[] X-BeenThere: freebsd-transport@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions of transport level network protocols in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Sep 2019 19:41:24 -0000 Richard Yes that is something one could do i.e. NewCWV=E2=80=A6 we had patches = for it at one time but it was largely untested. Pacing of course changes all of this as well.. i.e. BBR for example does = not worry about this since its pacing the packets so you never get a burst. We will soon = be committing and updated rack stack that also has this ability R > On Sep 12, 2019, at 2:49 PM, Scheffenegger, Richard = wrote: >=20 > Michael, > =20 > Thanks a lot for pointing out the uperf utility - this could be = configured exactly in the way I wanted to demonstrate this... > =20 > In the below graphs, I traced the evolution of cwnd for a flow across = the loopback in a VM. > =20 > The application is doing 3060 times 10kB writes, 10ms pausing (well = below the 230ms minimum TCP idle period), and once that phase is over, = floods the session with 8x writes of 10MB each. > =20 > Currently, the stack will initially grow cwnd up to the limit set by = the receiver's window (set to 1.2 MB) - during the low bandwidth rate = phase, where no loss occurs... > =20 > Thus the application can send out a massive burst of data in a single = RTT (or at linerate) when it chooses to do so... > =20 > =20 > Using the guidance given by NewCWV (RFC7661), and growing cwnd only, = when flightsize is larger than half of cwnd, the congestion window = remains in more reasonable ranges during the application limited phase, = thus limiting the maximum burst size. > =20 > Growth of cwnd in SS or CA otherwise is normal, but the inverse case = (application transisioning from high throughput to low) is not = addressed; but I wonder if a reduction could be achieved without the = timer infrastructure described in 7661 (e.g. reducing cwnd by 1 mss, = when flightsize is < =C2=BD cwnd, while not doing recovery=E2=80=A6 > =20 > > Unlimited ssthresh: > > =20 > =20 > > =20 > =20 > Richard Scheffenegger > Consulting Solution Architect > NAS & Networking > =20 > NetApp > +43 1 3676 811 3157 Direct Phone > +43 664 8866 1857 Mobile Phone > Richard.Scheffenegger@netapp.com > =20 > https://ts.la/richard49892 > =20 > =20 > -----Original Message----- > From: Randall Stewart =20 > Sent: Mittwoch, 11. September 2019 14:18 > To: Scheffenegger, Richard > Cc: Lawrence Stewart ; Michael Tuexen = ; Jonathan Looney ; = freebsd-transport@freebsd.org; Cui, Cheng ; Tom = Jones ; bz@freebsd.org; Eggert, Lars > Subject: Re: reno cwnd growth while app limited... > =20 > NetApp Security WARNING: This is an external email. Do not click links = or open attachments unless you recognize the sender and know the content = is safe. > =20 > =20 > =20 > =20 > Interesting graph :) > =20 > =20 > I know that years ago I had a discussion along these lines (talking = about burst-limits) with > Kacheong Poon and Mark Allman. IIRR Kacheong said, at that time, sun = limited the cwnd to > something like 4MSS more than the flight size (I could have that mixed = up though and it might > have been Mark proposing that.. its been a while sun was still a = company then :D). > =20 > On the other hand I am not sure that such a tight limit takes into = account all of the ack-artifacts that > seem to be rabid in the internet now.. BBR took the approach of = limiting its cwnd to 2xBDP (or at > least what it thought was the BDP).. which is more along the lines of = your .5 if I am reading you right. > =20 > It might be something worth looking into but I would want to = contemplate it for a while :) > =20 > R > =20 > > On Sep 11, 2019, at 8:04 AM, Scheffenegger, Richard = wrote: > >=20 > > Hi, > >=20 > > I was just looking at some graph data running two parallel dctcp = flows against a cubic receiver (some internal validation) with = traditional ecn feedback. > >=20 > > > >=20 > >=20 > > Now, in the beginning, a single flow can not overutilize the link = capacity, and never runs into any loss/mark=E2=80=A6 but the snd_cwnd = grows unbounded (since DCTCP is using the newreno =E2=80=9Ccc_ack_received= =E2=80=9D mechanism). > >=20 > > However, newreno_ack_received is only to grow snd_cwnd, when = CCF_CWND_LIMITED is set, which remains set as long as snd_cwnd < snd_wnd = (the receiver signaled receive-window). > >=20 > > But is this still* the correct behavior? > >=20 > > Say, the data flow rate is application limited (ever n milliseconds, = a few kB), and the receiver has a large window signalled =E2=80=93 cwnd = will grow until it matches the receivers window. If then the application = chooses to no longer restrict itself, it would possibly burst out = significantly more data than the queuing of the path can handle=E2=80=A6 > >=20 > > So, shouldn=E2=80=99t there be a second condition for cwnd growth, = that e.g. pipe (flightsize) is close to cwnd (factor 0.5 during slow = start, and say 0.85 during congestion avoidance), to prevent sudden = large bursts when a flow comes out of being application limited? The = intention here would be to restrict the worst case burst that could be = sent out (which is dealt will differently in other stacks), to ideally = still fit into the path=E2=80=99s queues=E2=80=A6 > >=20 > > RFC5681 is silent on application limited flows though (but one could = thing of application limiting a flow being another form of congestion, = during which cwnd shouldn=E2=80=99t grow=E2=80=A6) > >=20 > > In the example above, growing cwnd up to about 500 kB and then = remaining there should be approximately the expected setting =E2=80=93 = based on the average of two competing flows hovering at aroud 200-250 = kB=E2=80=A6 > >=20 > > *) I=E2=80=99m referring to the much higher likelihood nowadays, = that the application itself pacing and transfer volume violates the = design principle of TCP, where the implicit assumption was that the = sender has unlimited data to send, with the timing controlled at the = full disgression of TCP. > >=20 > >=20 > > Richard Scheffenegger > > Consulting Solution Architect > > NAS & Networking > >=20 > > NetApp > > +43 1 3676 811 3157 Direct Phone > > +43 664 8866 1857 Mobile Phone > > Richard.Scheffenegger@netapp.com > >=20 > >=20 > > > >=20 > > > > #DataDriven > >=20 > > https://ts.la/richard49892 > =20 > ------ > Randall Stewart > rrs@netflix.com ------ Randall Stewart rrs@netflix.com