Date: Fri, 11 Sep 2020 00:05:54 -0400 From: Liang Tian <l.tian.email@gmail.com> To: Randall Stewart <rrs@netflix.com> Cc: FreeBSD Transport <freebsd-transport@freebsd.org> Subject: Re: Fast recovery ssthresh value Message-ID: <CAJhigrhy1JeBvmUduvnyfGFd9cTgYSfgcP4kwR3RtMqEUdOhsQ@mail.gmail.com> In-Reply-To: <A982EE58-1F2F-400B-B8AA-9B3B5523826B@netflix.com> References: <CAJhigrhbguXQzeYGfMtPRK03fp6KR65q8gjB9e9L-5tGGsuyzQ@mail.gmail.com> <SN4PR0601MB3728D1F8ABC9C86972B6C53886590@SN4PR0601MB3728.namprd06.prod.outlook.com> <CAJhigrjdRzK5fKpE9jTQM5p-wzKUBALK7Cc34_Qbi7HCZ_NCXw@mail.gmail.com> <SN4PR0601MB372817A4C0D80D981B1CE52586270@SN4PR0601MB3728.namprd06.prod.outlook.com> <A982EE58-1F2F-400B-B8AA-9B3B5523826B@netflix.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Randall, Yes, rack is definitely the next thing I would experiment with. We are using the networking code in user space and I was able to integrate default and bbr stack. I still need to integrate rack(something is off) and also solve some problems with timer granularity. Thanks and I'll probably come back with questions on rack soon:) Regards, Liang On Thu, Sep 10, 2020 at 9:35 AM Randall Stewart <rrs@netflix.com> wrote: > > Liang: > > Or if you are on head, you can use rack which not only > has PRR built into it, but also has Rack and TLP as > well. > > Of course its only in Head unless you want to go to the effort > of back-porting it :) > > Note that NF uses this stack for all of its TCP connections in > the Big-I (but of course we use Head too) :) > > R > > > On Sep 10, 2020, at 5:49 AM, Scheffenegger, Richard <Richard.Scheffeneg= ger@netapp.com> wrote: > > > > Hi Liang, > > > > Yes, you are absolutely correct about this observation. The SACK loss r= ecovery will only send one MSS per received ACK right now - and when there= is ACK thinning present, will fail to timely recover all the missing packe= ts, eventually receiving no more ACK to clock out more retransmissions... > > > > I have a Diff in review, to implement Proportional Rate Reduction: > > > > https://reviews.freebsd.org/D18892 > > > > Which should address not only that issue about ACK thinning, but also t= he issue that current SACK loss recovery has to wait until pipe drops below= ssthresh, before the retransmissions are clocked out. And then, they would= actually be clocked out at the same rate at the incoming ACKs. This would = be the same rate as when the overload happened (barring any ACK thinning), = and as a secondary effect, it was observed that this behavior too can lead = to self-inflicted loss - of retransmissions. > > > > If you have the ability to patch your kernel with D18892 and observe ho= w the reaction is in your dramatic ACK thinning scenario, that would be goo= d to know! The assumption of the Patch was, that - as per TCP RFC requireme= nts - there is one ACK for each received out-of-sequence data segment, and = ACK drops / thinning are not happening on such a massive scale as you descr= ibe it. > > > > Best regards, > > > > Richard Scheffenegger > > > > -----Original Message----- > > From: owner-freebsd-transport@freebsd.org <owner-freebsd-transport@free= bsd.org> On Behalf Of Liang Tian > > Sent: Mittwoch, 9. September 2020 19:16 > > To: Scheffenegger, Richard <Richard.Scheffenegger@netapp.com> > > Cc: FreeBSD Transport <freebsd-transport@freebsd.org> > > Subject: Re: Fast recovery ssthresh value > > > > Hi Richard, > > > > Thanks for the explanation and sorry for the late reply. > > I've been investigating SACK loss recovery and I think I'm seeing an is= sue similar to the ABC L value issue that I reported > > previously(https://reviews.freebsd.org/D26120) and I do believe there i= s a deviation to RFC3517: > > The issue happens when a DupAck is received during SACK loss recovery i= n the presence of ACK Thinning or receiver enabling LRO, which means the SA= CK block edges could expand by more than 1 SMSS(We've seen 30*SMSS), i.e. a= single DupAck could decrement `pipe` by more than 1 SMSS. > > In RFC3517, > > (C) If cwnd - pipe >=3D 1 SMSS, the sender SHOULD transmit one or more = segments... > > (C.5) If cwnd - pipe >=3D 1 SMSS, return to (C.1) So based on RF= C, the sender should be able to send more segments if such DupAck is receiv= ed, because of the big change to `pipe`. > > > > In the current implementation, the cwin variable, which controls the am= ount of data that can be transmitted based on the new information, is dicta= ted by snd_cwnd. The snd_cwnd is incremented by 1 SMSS for each DupAck rece= ived. I believe this effectively limits the retransmission triggered by eac= h DupAck to 1 SMSS - deviation. > > 307 cwin =3D > > 308 imax(min(tp->snd_wnd, tp->snd_cwnd) - sack_bytes_rxmt, = 0); > > > > As a result, SACK is not doing enough recovery in this scenario and los= s has to be recovered by RTO. > > Again, I'd appreciate feedback from the community. > > > > Regards, > > Liang Tian > > > > > > > > > > On Sun, Aug 23, 2020 at 3:56 PM Scheffenegger, Richard <Richard.Scheffe= negger@netapp.com> wrote: > >> > >> Hi Liang, > >> > >> In SACK loss recovery, you can recover up to ssthresh (prior cwnd/2 [o= r 70% in case of cubic]) lost bytes - at least in theory. > >> > >> In comparison, (New)Reno can only recover one lost packet per window, = and then keeps on transmitting new segments (ack + cwnd), even before the r= eceipt of the retransmitted packet is acked. > >> > >> For historic reasons, the semantic of the variable cwnd is overloaded = during loss recovery, and it doesn't "really" indicate cwnd, but rather ind= icates if/when retransmissions can happen. > >> > >> > >> In both cases (also the simple one, with only one packet loss), cwnd s= hould be equal (or near equal) to ssthresh by the time loss recovery is fin= ished - but NOT before! While it may appear like slow-start, the value of t= he cwnd variable really increases by acked_bytes only per ACK (not acked_by= tes + SMSS), since the left edge (snd_una) doesn't move right - unlike duri= ng slow-start. But numerically, these different phases (slow-start / sack l= oss-recovery) may appear very similar. > >> > >> You could check this using the (loadable) SIFTR module, which captures= t_flags (indicating if cong/loss recovery is active), ssthresh, cwnd, and = other parameters. > >> > >> That is at least how things are supposed to work; or have you investig= ated the timing and behavior of SACK loss recovery and found a deviation to= RFC3517? Note that FBSD currently has not fully implemented RFC6675 suppor= t (which deviates slightly from 3517 under specific circumstances; I have a= patch pending to implemente 6675 rescue retransmissions, but haven't tweak= ed the other aspects of 6675 vs. 3517. > >> > >> BTW: While freebsd-net is not the wrong DL per se, TCP, UDP, SCTP spec= ific questions can also be posted to freebsd-transport, which is more narro= wly focused. > >> > >> Best regards, > >> > >> Richard Scheffenegger > >> > >> -----Original Message----- > >> From: owner-freebsd-net@freebsd.org <owner-freebsd-net@freebsd.org> On > >> Behalf Of Liang Tian > >> Sent: Sonntag, 23. August 2020 00:14 > >> To: freebsd-net <freebsd-net@freebsd.org> > >> Subject: Fast recovery ssthresh value > >> > >> Hi all, > >> > >> When 3 dupacks are received and TCP enter fast recovery, if SACK is us= ed, the CWND is set to maxseg: > >> > >> 2593 if (tp->t_flags & TF_SACK_PERMIT) { > >> 2594 TCPSTAT_INC( > >> 2595 tcps_sack_recovery_episode); > >> 2596 tp->snd_recover =3D tp->snd_nxt; > >> 2597 tp->snd_cwnd =3D maxseg; > >> 2598 (void) tp->t_fb->tfb_tcp_output(tp); > >> 2599 goto drop; > >> 2600 } > >> > >> Otherwise(SACK is not in use), CWND is set to maxseg before > >> tcp_output() and then set back to snd_ssthresh+inflation > >> 2601 tp->snd_nxt =3D th->th_ack; > >> 2602 tp->snd_cwnd =3D maxseg; > >> 2603 (void) tp->t_fb->tfb_tcp_output(tp); > >> 2604 KASSERT(tp->snd_limited <=3D 2, > >> 2605 ("%s: tp->snd_limited too big", > >> 2606 __func__)); > >> 2607 tp->snd_cwnd =3D tp->snd_ssthresh + > >> 2608 maxseg * > >> 2609 (tp->t_dupacks - tp->snd_limited); > >> 2610 if (SEQ_GT(onxt, tp->snd_nxt)) > >> 2611 tp->snd_nxt =3D onxt; > >> 2612 goto drop; > >> > >> I'm wondering in the SACK case, should CWND be set back to ssthresh(wh= ich has been slashed in cc_cong_signal() a few lines above) before line 259= 9, like non-SACK case, instead of doing slow start from maxseg? > >> I read rfc6675 and a few others, and it looks like that's the case. I = appreciate your opinion, again. > >> > >> Thanks, > >> Liang > >> _______________________________________________ > >> freebsd-net@freebsd.org mailing list > >> https://lists.freebsd.org/mailman/listinfo/freebsd-net > >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > _______________________________________________ > > freebsd-transport@freebsd.org mailing list https://lists.freebsd.org/ma= ilman/listinfo/freebsd-transport > > To unsubscribe, send any mail to "freebsd-transport-unsubscribe@freebsd= .org" > > _______________________________________________ > > freebsd-transport@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-transport > > To unsubscribe, send any mail to "freebsd-transport-unsubscribe@freebsd= .org" > > ------ > Randall Stewart > rrs@netflix.com > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJhigrhy1JeBvmUduvnyfGFd9cTgYSfgcP4kwR3RtMqEUdOhsQ>