Date: Thu, 12 Sep 2019 15:41:19 -0400 From: Randall Stewart <rrs@netflix.com> To: "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com> Cc: Michael Tuexen <tuexen@FreeBSD.org>, Lawrence Stewart <lstewart@netflix.com>, Jonathan Looney <jtl@netflix.com>, "freebsd-transport@freebsd.org" <freebsd-transport@freebsd.org>, "Cui, Cheng" <Cheng.Cui@netapp.com>, Tom Jones <thj@freebsd.org>, "bz@freebsd.org" <bz@freebsd.org>, "Eggert, Lars" <lars@netapp.com> Subject: Re: reno cwnd growth while app limited... Message-ID: <1E528E13-B087-4318-A9CA-F08957B3F03A@netflix.com> In-Reply-To: <SN4PR0601MB37282B52C38205880B02A81D86B00@SN4PR0601MB3728.namprd06.prod.outlook.com> References: <CY4PR0601MB3715561F17C9C213ACFBB61586B10@CY4PR0601MB3715.namprd06.prod.outlook.com> <E86A08A0-CFB6-4292-ADBD-AEB2E1DAF64C@netflix.com> <SN4PR0601MB37282B52C38205880B02A81D86B00@SN4PR0601MB3728.namprd06.prod.outlook.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Richard Yes that is something one could do i.e. NewCWV=E2=80=A6 we had patches = for it at one time but it was largely untested. Pacing of course changes all of this as well.. i.e. BBR for example does = not worry about this since its pacing the packets so you never get a burst. We will soon = be committing and updated rack stack that also has this ability R > On Sep 12, 2019, at 2:49 PM, Scheffenegger, Richard = <Richard.Scheffenegger@netapp.com> wrote: >=20 > Michael, > =20 > Thanks a lot for pointing out the uperf utility - this could be = configured exactly in the way I wanted to demonstrate this... > =20 > In the below graphs, I traced the evolution of cwnd for a flow across = the loopback in a VM. > =20 > The application is doing 3060 times 10kB writes, 10ms pausing (well = below the 230ms minimum TCP idle period), and once that phase is over, = floods the session with 8x writes of 10MB each. > =20 > Currently, the stack will initially grow cwnd up to the limit set by = the receiver's window (set to 1.2 MB) - during the low bandwidth rate = phase, where no loss occurs... > =20 > Thus the application can send out a massive burst of data in a single = RTT (or at linerate) when it chooses to do so... > =20 > =20 > Using the guidance given by NewCWV (RFC7661), and growing cwnd only, = when flightsize is larger than half of cwnd, the congestion window = remains in more reasonable ranges during the application limited phase, = thus limiting the maximum burst size. > =20 > Growth of cwnd in SS or CA otherwise is normal, but the inverse case = (application transisioning from high throughput to low) is not = addressed; but I wonder if a reduction could be achieved without the = timer infrastructure described in 7661 (e.g. reducing cwnd by 1 mss, = when flightsize is < =C2=BD cwnd, while not doing recovery=E2=80=A6 > =20 > <image004.png> > Unlimited ssthresh: > <image005.png> > =20 > =20 > <image009.png> > =20 > =20 > Richard Scheffenegger > Consulting Solution Architect > NAS & Networking > =20 > NetApp > +43 1 3676 811 3157 Direct Phone > +43 664 8866 1857 Mobile Phone > Richard.Scheffenegger@netapp.com > =20 > https://ts.la/richard49892 > =20 > =20 > -----Original Message----- > From: Randall Stewart <rrs@netflix.com>=20 > Sent: Mittwoch, 11. September 2019 14:18 > To: Scheffenegger, Richard <Richard.Scheffenegger@netapp.com> > Cc: Lawrence Stewart <lstewart@netflix.com>; Michael Tuexen = <tuexen@FreeBSD.org>; Jonathan Looney <jtl@netflix.com>; = freebsd-transport@freebsd.org; Cui, Cheng <Cheng.Cui@netapp.com>; Tom = Jones <thj@freebsd.org>; bz@freebsd.org; Eggert, Lars <lars@netapp.com> > Subject: Re: reno cwnd growth while app limited... > =20 > NetApp Security WARNING: This is an external email. Do not click links = or open attachments unless you recognize the sender and know the content = is safe. > =20 > =20 > =20 > =20 > Interesting graph :) > =20 > =20 > I know that years ago I had a discussion along these lines (talking = about burst-limits) with > Kacheong Poon and Mark Allman. IIRR Kacheong said, at that time, sun = limited the cwnd to > something like 4MSS more than the flight size (I could have that mixed = up though and it might > have been Mark proposing that.. its been a while sun was still a = company then :D). > =20 > On the other hand I am not sure that such a tight limit takes into = account all of the ack-artifacts that > seem to be rabid in the internet now.. BBR took the approach of = limiting its cwnd to 2xBDP (or at > least what it thought was the BDP).. which is more along the lines of = your .5 if I am reading you right. > =20 > It might be something worth looking into but I would want to = contemplate it for a while :) > =20 > R > =20 > > On Sep 11, 2019, at 8:04 AM, Scheffenegger, Richard = <Richard.Scheffenegger@netapp.com> wrote: > >=20 > > Hi, > >=20 > > I was just looking at some graph data running two parallel dctcp = flows against a cubic receiver (some internal validation) with = traditional ecn feedback. > >=20 > > <image002.jpg> > >=20 > >=20 > > Now, in the beginning, a single flow can not overutilize the link = capacity, and never runs into any loss/mark=E2=80=A6 but the snd_cwnd = grows unbounded (since DCTCP is using the newreno =E2=80=9Ccc_ack_received= =E2=80=9D mechanism). > >=20 > > However, newreno_ack_received is only to grow snd_cwnd, when = CCF_CWND_LIMITED is set, which remains set as long as snd_cwnd < snd_wnd = (the receiver signaled receive-window). > >=20 > > But is this still* the correct behavior? > >=20 > > Say, the data flow rate is application limited (ever n milliseconds, = a few kB), and the receiver has a large window signalled =E2=80=93 cwnd = will grow until it matches the receivers window. If then the application = chooses to no longer restrict itself, it would possibly burst out = significantly more data than the queuing of the path can handle=E2=80=A6 > >=20 > > So, shouldn=E2=80=99t there be a second condition for cwnd growth, = that e.g. pipe (flightsize) is close to cwnd (factor 0.5 during slow = start, and say 0.85 during congestion avoidance), to prevent sudden = large bursts when a flow comes out of being application limited? The = intention here would be to restrict the worst case burst that could be = sent out (which is dealt will differently in other stacks), to ideally = still fit into the path=E2=80=99s queues=E2=80=A6 > >=20 > > RFC5681 is silent on application limited flows though (but one could = thing of application limiting a flow being another form of congestion, = during which cwnd shouldn=E2=80=99t grow=E2=80=A6) > >=20 > > In the example above, growing cwnd up to about 500 kB and then = remaining there should be approximately the expected setting =E2=80=93 = based on the average of two competing flows hovering at aroud 200-250 = kB=E2=80=A6 > >=20 > > *) I=E2=80=99m referring to the much higher likelihood nowadays, = that the application itself pacing and transfer volume violates the = design principle of TCP, where the implicit assumption was that the = sender has unlimited data to send, with the timing controlled at the = full disgression of TCP. > >=20 > >=20 > > Richard Scheffenegger > > Consulting Solution Architect > > NAS & Networking > >=20 > > NetApp > > +43 1 3676 811 3157 Direct Phone > > +43 664 8866 1857 Mobile Phone > > Richard.Scheffenegger@netapp.com > >=20 > >=20 > > <image004.jpg> > >=20 > > <image006.jpg> <image012.jpg> > > #DataDriven > >=20 > > https://ts.la/richard49892 > =20 > ------ > Randall Stewart > rrs@netflix.com ------ Randall Stewart rrs@netflix.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1E528E13-B087-4318-A9CA-F08957B3F03A>