Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Sep 2020 00:05:30 -0400
From:      Liang Tian <l.tian.email@gmail.com>
To:        "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com>,  FreeBSD Transport <freebsd-transport@freebsd.org>
Subject:   Re: Fast recovery ssthresh value
Message-ID:  <CAJhigrgZDE4TURO%2BLJPr5nK--O%2BPwV4-cPHYJXdk08_K8GBkwQ@mail.gmail.com>
In-Reply-To: <SN4PR0601MB372817A4C0D80D981B1CE52586270@SN4PR0601MB3728.namprd06.prod.outlook.com>
References:  <CAJhigrhbguXQzeYGfMtPRK03fp6KR65q8gjB9e9L-5tGGsuyzQ@mail.gmail.com> <SN4PR0601MB3728D1F8ABC9C86972B6C53886590@SN4PR0601MB3728.namprd06.prod.outlook.com> <CAJhigrjdRzK5fKpE9jTQM5p-wzKUBALK7Cc34_Qbi7HCZ_NCXw@mail.gmail.com> <SN4PR0601MB372817A4C0D80D981B1CE52586270@SN4PR0601MB3728.namprd06.prod.outlook.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Richard,

Thanks! I'm able to apply the patches. I'll test it.

Regards,
Liang



On Thu, Sep 10, 2020 at 5:49 AM Scheffenegger, Richard
<Richard.Scheffenegger@netapp.com> wrote:
>
> Hi Liang,
>
> Yes, you are absolutely correct about this observation. The SACK loss rec=
overy will only send  one MSS per received ACK right now - and when there i=
s ACK thinning present, will fail to timely recover all the missing packets=
, eventually receiving no more ACK to clock out more retransmissions...
>
> I have a Diff in review, to implement Proportional Rate Reduction:
>
> https://reviews.freebsd.org/D18892
>
> Which should address not only that issue about ACK thinning, but also the=
 issue that current SACK loss recovery has to wait until pipe drops below s=
sthresh, before the retransmissions are clocked out. And then, they would a=
ctually be clocked out at the same rate at the incoming ACKs. This would be=
 the same rate as when the overload happened (barring any ACK thinning), an=
d as a secondary effect, it was observed that this behavior too can lead to=
 self-inflicted loss - of retransmissions.
>
> If you have the ability to patch your kernel with D18892 and observe how =
the reaction is in your dramatic ACK thinning scenario, that would be good =
to know! The assumption of the Patch was, that - as per TCP RFC requirement=
s - there is one ACK for each received out-of-sequence data segment, and AC=
K drops / thinning are not happening on such a massive scale as you describ=
e it.
>
> Best regards,
>
> Richard Scheffenegger
>
> -----Original Message-----
> From: owner-freebsd-transport@freebsd.org <owner-freebsd-transport@freebs=
d.org> On Behalf Of Liang Tian
> Sent: Mittwoch, 9. September 2020 19:16
> To: Scheffenegger, Richard <Richard.Scheffenegger@netapp.com>
> Cc: FreeBSD Transport <freebsd-transport@freebsd.org>
> Subject: Re: Fast recovery ssthresh value
>
> Hi Richard,
>
> Thanks for the explanation and sorry for the late reply.
> I've been investigating SACK loss recovery and I think I'm seeing an issu=
e similar to the ABC L value issue that I reported
> previously(https://reviews.freebsd.org/D26120) and I do believe there is =
a deviation to RFC3517:
> The issue happens when a DupAck is received during SACK loss recovery in =
the presence of ACK Thinning or receiver enabling LRO, which means the SACK=
 block edges could expand by more than 1 SMSS(We've seen 30*SMSS), i.e. a s=
ingle DupAck could decrement `pipe` by more than 1 SMSS.
> In RFC3517,
> (C) If cwnd - pipe >=3D 1 SMSS, the sender SHOULD transmit one or more se=
gments...
>         (C.5) If cwnd - pipe >=3D 1 SMSS, return to (C.1) So based on RFC=
, the sender should be able to send more segments if such DupAck is receive=
d, because of the big change to `pipe`.
>
> In the current implementation, the cwin variable, which controls the amou=
nt of data that can be transmitted based on the new information, is dictate=
d by snd_cwnd. The snd_cwnd is incremented by 1 SMSS for each DupAck receiv=
ed. I believe this effectively limits the retransmission triggered by each =
DupAck to 1 SMSS -  deviation.
>  307         cwin =3D
>  308             imax(min(tp->snd_wnd, tp->snd_cwnd) - sack_bytes_rxmt, 0=
);
>
> As a result, SACK is not doing enough recovery in this scenario and loss =
has to be recovered by RTO.
> Again, I'd appreciate feedback from the community.
>
> Regards,
> Liang Tian
>
>
>
>
> On Sun, Aug 23, 2020 at 3:56 PM Scheffenegger, Richard <Richard.Scheffene=
gger@netapp.com> wrote:
> >
> > Hi Liang,
> >
> > In SACK loss recovery, you can recover up to ssthresh (prior cwnd/2 [or=
 70% in case of cubic]) lost bytes - at least in theory.
> >
> > In comparison, (New)Reno can only recover one lost packet per window, a=
nd then keeps on transmitting new segments (ack + cwnd), even before the re=
ceipt of the retransmitted packet is acked.
> >
> > For historic reasons, the semantic of the variable cwnd is overloaded d=
uring loss recovery, and it doesn't "really" indicate cwnd, but rather indi=
cates if/when retransmissions can happen.
> >
> >
> > In both cases (also the simple one, with only one packet loss), cwnd sh=
ould be equal (or near equal) to ssthresh by the time loss recovery is fini=
shed - but NOT before! While it may appear like slow-start, the value of th=
e cwnd variable really increases by acked_bytes only per ACK (not acked_byt=
es + SMSS), since the left edge (snd_una) doesn't move right - unlike durin=
g slow-start. But numerically, these different phases (slow-start / sack lo=
ss-recovery) may appear very similar.
> >
> > You could check this using the (loadable) SIFTR module, which captures =
t_flags (indicating if cong/loss recovery is active), ssthresh, cwnd, and o=
ther parameters.
> >
> > That is at least how things are supposed to work; or have you investiga=
ted the timing and behavior of SACK loss recovery and found a deviation to =
RFC3517? Note that FBSD currently has not fully implemented RFC6675 support=
 (which deviates slightly from 3517 under specific circumstances; I have a =
patch pending to implemente 6675 rescue retransmissions, but haven't tweake=
d the other aspects of 6675 vs. 3517.
> >
> > BTW: While freebsd-net is not the wrong DL per se, TCP, UDP, SCTP speci=
fic questions can also be posted to freebsd-transport, which is more narrow=
ly focused.
> >
> > Best regards,
> >
> > Richard Scheffenegger
> >
> > -----Original Message-----
> > From: owner-freebsd-net@freebsd.org <owner-freebsd-net@freebsd.org> On
> > Behalf Of Liang Tian
> > Sent: Sonntag, 23. August 2020 00:14
> > To: freebsd-net <freebsd-net@freebsd.org>
> > Subject: Fast recovery ssthresh value
> >
> > Hi all,
> >
> > When 3 dupacks are received and TCP enter fast recovery, if SACK is use=
d, the CWND is set to maxseg:
> >
> > 2593                     if (tp->t_flags & TF_SACK_PERMIT) {
> > 2594                         TCPSTAT_INC(
> > 2595                             tcps_sack_recovery_episode);
> > 2596                         tp->snd_recover =3D tp->snd_nxt;
> > 2597                         tp->snd_cwnd =3D maxseg;
> > 2598                         (void) tp->t_fb->tfb_tcp_output(tp);
> > 2599                         goto drop;
> > 2600                     }
> >
> > Otherwise(SACK is not in use), CWND is set to maxseg before
> > tcp_output() and then set back to snd_ssthresh+inflation
> > 2601                     tp->snd_nxt =3D th->th_ack;
> > 2602                     tp->snd_cwnd =3D maxseg;
> > 2603                     (void) tp->t_fb->tfb_tcp_output(tp);
> > 2604                     KASSERT(tp->snd_limited <=3D 2,
> > 2605                         ("%s: tp->snd_limited too big",
> > 2606                         __func__));
> > 2607                     tp->snd_cwnd =3D tp->snd_ssthresh +
> > 2608                          maxseg *
> > 2609                          (tp->t_dupacks - tp->snd_limited);
> > 2610                     if (SEQ_GT(onxt, tp->snd_nxt))
> > 2611                         tp->snd_nxt =3D onxt;
> > 2612                     goto drop;
> >
> > I'm wondering in the SACK case, should CWND be set back to ssthresh(whi=
ch has been slashed in cc_cong_signal() a few lines above) before line 2599=
, like non-SACK case, instead of doing slow start from maxseg?
> > I read rfc6675 and a few others, and it looks like that's the case. I a=
ppreciate your opinion, again.
> >
> > Thanks,
> > Liang
> > _______________________________________________
> > freebsd-net@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
> _______________________________________________
> freebsd-transport@freebsd.org mailing list https://lists.freebsd.org/mail=
man/listinfo/freebsd-transport
> To unsubscribe, send any mail to "freebsd-transport-unsubscribe@freebsd.o=
rg"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJhigrgZDE4TURO%2BLJPr5nK--O%2BPwV4-cPHYJXdk08_K8GBkwQ>