Date: Thu, 10 Sep 2020 09:49:13 +0000 From: "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com> To: Liang Tian <l.tian.email@gmail.com> Cc: FreeBSD Transport <freebsd-transport@freebsd.org> Subject: RE: Fast recovery ssthresh value Message-ID: <SN4PR0601MB372817A4C0D80D981B1CE52586270@SN4PR0601MB3728.namprd06.prod.outlook.com> In-Reply-To: <CAJhigrjdRzK5fKpE9jTQM5p-wzKUBALK7Cc34_Qbi7HCZ_NCXw@mail.gmail.com> References: <CAJhigrhbguXQzeYGfMtPRK03fp6KR65q8gjB9e9L-5tGGsuyzQ@mail.gmail.com> <SN4PR0601MB3728D1F8ABC9C86972B6C53886590@SN4PR0601MB3728.namprd06.prod.outlook.com> <CAJhigrjdRzK5fKpE9jTQM5p-wzKUBALK7Cc34_Qbi7HCZ_NCXw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Liang, Yes, you are absolutely correct about this observation. The SACK loss recov= ery will only send one MSS per received ACK right now - and when there is = ACK thinning present, will fail to timely recover all the missing packets, = eventually receiving no more ACK to clock out more retransmissions... I have a Diff in review, to implement Proportional Rate Reduction: https://reviews.freebsd.org/D18892 Which should address not only that issue about ACK thinning, but also the i= ssue that current SACK loss recovery has to wait until pipe drops below sst= hresh, before the retransmissions are clocked out. And then, they would act= ually be clocked out at the same rate at the incoming ACKs. This would be t= he same rate as when the overload happened (barring any ACK thinning), and = as a secondary effect, it was observed that this behavior too can lead to s= elf-inflicted loss - of retransmissions. If you have the ability to patch your kernel with D18892 and observe how th= e reaction is in your dramatic ACK thinning scenario, that would be good to= know! The assumption of the Patch was, that - as per TCP RFC requirements = - there is one ACK for each received out-of-sequence data segment, and ACK = drops / thinning are not happening on such a massive scale as you describe = it. Best regards, Richard Scheffenegger -----Original Message----- From: owner-freebsd-transport@freebsd.org <owner-freebsd-transport@freebsd.= org> On Behalf Of Liang Tian Sent: Mittwoch, 9. September 2020 19:16 To: Scheffenegger, Richard <Richard.Scheffenegger@netapp.com> Cc: FreeBSD Transport <freebsd-transport@freebsd.org> Subject: Re: Fast recovery ssthresh value Hi Richard, Thanks for the explanation and sorry for the late reply. I've been investigating SACK loss recovery and I think I'm seeing an issue = similar to the ABC L value issue that I reported previously(https://reviews.freebsd.org/D26120) and I do believe there is a = deviation to RFC3517: The issue happens when a DupAck is received during SACK loss recovery in th= e presence of ACK Thinning or receiver enabling LRO, which means the SACK b= lock edges could expand by more than 1 SMSS(We've seen 30*SMSS), i.e. a sin= gle DupAck could decrement `pipe` by more than 1 SMSS. In RFC3517, (C) If cwnd - pipe >=3D 1 SMSS, the sender SHOULD transmit one or more segm= ents... (C.5) If cwnd - pipe >=3D 1 SMSS, return to (C.1) So based on RFC, = the sender should be able to send more segments if such DupAck is received,= because of the big change to `pipe`. In the current implementation, the cwin variable, which controls the amount= of data that can be transmitted based on the new information, is dictated = by snd_cwnd. The snd_cwnd is incremented by 1 SMSS for each DupAck received= . I believe this effectively limits the retransmission triggered by each Du= pAck to 1 SMSS - deviation. 307 cwin =3D 308 imax(min(tp->snd_wnd, tp->snd_cwnd) - sack_bytes_rxmt, 0); As a result, SACK is not doing enough recovery in this scenario and loss ha= s to be recovered by RTO. Again, I'd appreciate feedback from the community. Regards, Liang Tian On Sun, Aug 23, 2020 at 3:56 PM Scheffenegger, Richard <Richard.Scheffenegg= er@netapp.com> wrote: > > Hi Liang, > > In SACK loss recovery, you can recover up to ssthresh (prior cwnd/2 [or 7= 0% in case of cubic]) lost bytes - at least in theory. > > In comparison, (New)Reno can only recover one lost packet per window, and= then keeps on transmitting new segments (ack + cwnd), even before the rece= ipt of the retransmitted packet is acked. > > For historic reasons, the semantic of the variable cwnd is overloaded dur= ing loss recovery, and it doesn't "really" indicate cwnd, but rather indica= tes if/when retransmissions can happen. > > > In both cases (also the simple one, with only one packet loss), cwnd shou= ld be equal (or near equal) to ssthresh by the time loss recovery is finish= ed - but NOT before! While it may appear like slow-start, the value of the = cwnd variable really increases by acked_bytes only per ACK (not acked_bytes= + SMSS), since the left edge (snd_una) doesn't move right - unlike during = slow-start. But numerically, these different phases (slow-start / sack loss= -recovery) may appear very similar. > > You could check this using the (loadable) SIFTR module, which captures t_= flags (indicating if cong/loss recovery is active), ssthresh, cwnd, and oth= er parameters. > > That is at least how things are supposed to work; or have you investigate= d the timing and behavior of SACK loss recovery and found a deviation to RF= C3517? Note that FBSD currently has not fully implemented RFC6675 support (= which deviates slightly from 3517 under specific circumstances; I have a pa= tch pending to implemente 6675 rescue retransmissions, but haven't tweaked = the other aspects of 6675 vs. 3517. > > BTW: While freebsd-net is not the wrong DL per se, TCP, UDP, SCTP specifi= c questions can also be posted to freebsd-transport, which is more narrowly= focused. > > Best regards, > > Richard Scheffenegger > > -----Original Message----- > From: owner-freebsd-net@freebsd.org <owner-freebsd-net@freebsd.org> On=20 > Behalf Of Liang Tian > Sent: Sonntag, 23. August 2020 00:14 > To: freebsd-net <freebsd-net@freebsd.org> > Subject: Fast recovery ssthresh value > > Hi all, > > When 3 dupacks are received and TCP enter fast recovery, if SACK is used,= the CWND is set to maxseg: > > 2593 if (tp->t_flags & TF_SACK_PERMIT) { > 2594 TCPSTAT_INC( > 2595 tcps_sack_recovery_episode); > 2596 tp->snd_recover =3D tp->snd_nxt; > 2597 tp->snd_cwnd =3D maxseg; > 2598 (void) tp->t_fb->tfb_tcp_output(tp); > 2599 goto drop; > 2600 } > > Otherwise(SACK is not in use), CWND is set to maxseg before > tcp_output() and then set back to snd_ssthresh+inflation > 2601 tp->snd_nxt =3D th->th_ack; > 2602 tp->snd_cwnd =3D maxseg; > 2603 (void) tp->t_fb->tfb_tcp_output(tp); > 2604 KASSERT(tp->snd_limited <=3D 2, > 2605 ("%s: tp->snd_limited too big", > 2606 __func__)); > 2607 tp->snd_cwnd =3D tp->snd_ssthresh + > 2608 maxseg * > 2609 (tp->t_dupacks - tp->snd_limited); > 2610 if (SEQ_GT(onxt, tp->snd_nxt)) > 2611 tp->snd_nxt =3D onxt; > 2612 goto drop; > > I'm wondering in the SACK case, should CWND be set back to ssthresh(which= has been slashed in cc_cong_signal() a few lines above) before line 2599, = like non-SACK case, instead of doing slow start from maxseg? > I read rfc6675 and a few others, and it looks like that's the case. I app= reciate your opinion, again. > > Thanks, > Liang > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" _______________________________________________ freebsd-transport@freebsd.org mailing list https://lists.freebsd.org/mailma= n/listinfo/freebsd-transport To unsubscribe, send any mail to "freebsd-transport-unsubscribe@freebsd.org= "
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?SN4PR0601MB372817A4C0D80D981B1CE52586270>