From owner-freebsd-transport@freebsd.org Fri Sep 11 04:05:45 2020 Return-Path: Delivered-To: freebsd-transport@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 712B93EC445 for ; Fri, 11 Sep 2020 04:05:45 +0000 (UTC) (envelope-from l.tian.email@gmail.com) Received: from mail-ed1-x52a.google.com (mail-ed1-x52a.google.com [IPv6:2a00:1450:4864:20::52a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Bnhxh4dYDz3Wd7 for ; Fri, 11 Sep 2020 04:05:44 +0000 (UTC) (envelope-from l.tian.email@gmail.com) Received: by mail-ed1-x52a.google.com with SMTP id c10so8564827edk.6 for ; Thu, 10 Sep 2020 21:05:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=1HeDKypO/Huqb7DH98UckpXQTVRfpKhePiKNNVAZR1s=; b=Pr66pAARpvVGZudmaUvpKHzsIHZ5s3sbssRpJENwBfTyyZr5sV63VhUhtTsmuF5H7b ATp2Aa+H25dD5uBlYsEe0GPNl7S6ysZnEmeLeK3owiRO9Jo8U/Z9nFTeRg2NaDUW4BR5 7fXq4hq6tPklTDpED43SRozSgLzdMPsCyV6En2E6TIACh8nN9w4wmdIWOEB2gIkLmqtQ OW75zZf/+U4pUTxNZmBs3Bc1dCk5tLyuFVcuzfHSmzM0RuBsngoifB7qXC5HjlmzbIl4 malgDpnNrQugmVlBUSWBUyEZVSvnmyiqsIHX7Pjm6P1AsiI5Y0TfSirqpI7s5XL1CBTo ns1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=1HeDKypO/Huqb7DH98UckpXQTVRfpKhePiKNNVAZR1s=; b=KfUSGAa7hUsYAZHBGo0eDWkWDLK4LNKhnWM2m/4TgK1fAvjbJi2+121fC8z9b3vBz0 loVp5Pb2WGuts/tOx7KpqUJt6ES4EynnZzj6+LVvih0PR0fg9FvFiLap6CDki8KGTTMX 9wy5U3Ael7rjcDl5XOCzEFTtyKkeTPB1IqkIk2MH14Po6PP3TRskGKye634qqvpimunv qtt7iJ5OYTZ7KsR5tC/67Lr5Go6LQ/SbNm7SnVcuAdZIufaorIpTPqjVARZ+X9HPl1ZT qNpjbLLbK8hzhEQOdhG3cAekc9Zc6oa8QHUswQTchtzeUAF/pbcpWheKxuMCm3G9qMyv Uo1A== X-Gm-Message-State: AOAM531y1x7ISRZ1gAZrbxHHp0H5DNB91AifZvJXpJPGK5gMCb+7AlA0 7LvA+5pG5v/jQINBSJJJcn2uMZIrVpVs8hY+w3s= X-Google-Smtp-Source: ABdhPJyQEjfvCBiX8ZoBXt3llt/e5yph1auTvP01fNvRbOgIFmUlfemvjwt202UQgByFFDy8ODY2Xl6JbL3kkf8n+l8= X-Received: by 2002:a50:9f22:: with SMTP id b31mr18452edf.345.1599797141743; Thu, 10 Sep 2020 21:05:41 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Liang Tian Date: Fri, 11 Sep 2020 00:05:30 -0400 Message-ID: Subject: Re: Fast recovery ssthresh value To: "Scheffenegger, Richard" , FreeBSD Transport Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4Bnhxh4dYDz3Wd7 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=Pr66pAAR; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of ltianemail@gmail.com designates 2a00:1450:4864:20::52a as permitted sender) smtp.mailfrom=ltianemail@gmail.com X-Spamd-Result: default: False [-3.66 / 15.00]; RCVD_TLS_ALL(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; NEURAL_HAM_MEDIUM(-0.97)[-0.975]; FROM_HAS_DN(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; FREEMAIL_FROM(0.00)[gmail.com]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-transport@freebsd.org]; NEURAL_HAM_LONG(-0.97)[-0.970]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::52a:from]; NEURAL_HAM_SHORT(-0.72)[-0.717]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; TAGGED_FROM(0.00)[]; MAILMAN_DEST(0.00)[freebsd-transport]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-transport@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussions of transport level network protocols in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Sep 2020 04:05:45 -0000 Hi Richard, Thanks! I'm able to apply the patches. I'll test it. Regards, Liang On Thu, Sep 10, 2020 at 5:49 AM Scheffenegger, Richard wrote: > > Hi Liang, > > Yes, you are absolutely correct about this observation. The SACK loss rec= overy will only send one MSS per received ACK right now - and when there i= s ACK thinning present, will fail to timely recover all the missing packets= , eventually receiving no more ACK to clock out more retransmissions... > > I have a Diff in review, to implement Proportional Rate Reduction: > > https://reviews.freebsd.org/D18892 > > Which should address not only that issue about ACK thinning, but also the= issue that current SACK loss recovery has to wait until pipe drops below s= sthresh, before the retransmissions are clocked out. And then, they would a= ctually be clocked out at the same rate at the incoming ACKs. This would be= the same rate as when the overload happened (barring any ACK thinning), an= d as a secondary effect, it was observed that this behavior too can lead to= self-inflicted loss - of retransmissions. > > If you have the ability to patch your kernel with D18892 and observe how = the reaction is in your dramatic ACK thinning scenario, that would be good = to know! The assumption of the Patch was, that - as per TCP RFC requirement= s - there is one ACK for each received out-of-sequence data segment, and AC= K drops / thinning are not happening on such a massive scale as you describ= e it. > > Best regards, > > Richard Scheffenegger > > -----Original Message----- > From: owner-freebsd-transport@freebsd.org On Behalf Of Liang Tian > Sent: Mittwoch, 9. September 2020 19:16 > To: Scheffenegger, Richard > Cc: FreeBSD Transport > Subject: Re: Fast recovery ssthresh value > > Hi Richard, > > Thanks for the explanation and sorry for the late reply. > I've been investigating SACK loss recovery and I think I'm seeing an issu= e similar to the ABC L value issue that I reported > previously(https://reviews.freebsd.org/D26120) and I do believe there is = a deviation to RFC3517: > The issue happens when a DupAck is received during SACK loss recovery in = the presence of ACK Thinning or receiver enabling LRO, which means the SACK= block edges could expand by more than 1 SMSS(We've seen 30*SMSS), i.e. a s= ingle DupAck could decrement `pipe` by more than 1 SMSS. > In RFC3517, > (C) If cwnd - pipe >=3D 1 SMSS, the sender SHOULD transmit one or more se= gments... > (C.5) If cwnd - pipe >=3D 1 SMSS, return to (C.1) So based on RFC= , the sender should be able to send more segments if such DupAck is receive= d, because of the big change to `pipe`. > > In the current implementation, the cwin variable, which controls the amou= nt of data that can be transmitted based on the new information, is dictate= d by snd_cwnd. The snd_cwnd is incremented by 1 SMSS for each DupAck receiv= ed. I believe this effectively limits the retransmission triggered by each = DupAck to 1 SMSS - deviation. > 307 cwin =3D > 308 imax(min(tp->snd_wnd, tp->snd_cwnd) - sack_bytes_rxmt, 0= ); > > As a result, SACK is not doing enough recovery in this scenario and loss = has to be recovered by RTO. > Again, I'd appreciate feedback from the community. > > Regards, > Liang Tian > > > > > On Sun, Aug 23, 2020 at 3:56 PM Scheffenegger, Richard wrote: > > > > Hi Liang, > > > > In SACK loss recovery, you can recover up to ssthresh (prior cwnd/2 [or= 70% in case of cubic]) lost bytes - at least in theory. > > > > In comparison, (New)Reno can only recover one lost packet per window, a= nd then keeps on transmitting new segments (ack + cwnd), even before the re= ceipt of the retransmitted packet is acked. > > > > For historic reasons, the semantic of the variable cwnd is overloaded d= uring loss recovery, and it doesn't "really" indicate cwnd, but rather indi= cates if/when retransmissions can happen. > > > > > > In both cases (also the simple one, with only one packet loss), cwnd sh= ould be equal (or near equal) to ssthresh by the time loss recovery is fini= shed - but NOT before! While it may appear like slow-start, the value of th= e cwnd variable really increases by acked_bytes only per ACK (not acked_byt= es + SMSS), since the left edge (snd_una) doesn't move right - unlike durin= g slow-start. But numerically, these different phases (slow-start / sack lo= ss-recovery) may appear very similar. > > > > You could check this using the (loadable) SIFTR module, which captures = t_flags (indicating if cong/loss recovery is active), ssthresh, cwnd, and o= ther parameters. > > > > That is at least how things are supposed to work; or have you investiga= ted the timing and behavior of SACK loss recovery and found a deviation to = RFC3517? Note that FBSD currently has not fully implemented RFC6675 support= (which deviates slightly from 3517 under specific circumstances; I have a = patch pending to implemente 6675 rescue retransmissions, but haven't tweake= d the other aspects of 6675 vs. 3517. > > > > BTW: While freebsd-net is not the wrong DL per se, TCP, UDP, SCTP speci= fic questions can also be posted to freebsd-transport, which is more narrow= ly focused. > > > > Best regards, > > > > Richard Scheffenegger > > > > -----Original Message----- > > From: owner-freebsd-net@freebsd.org On > > Behalf Of Liang Tian > > Sent: Sonntag, 23. August 2020 00:14 > > To: freebsd-net > > Subject: Fast recovery ssthresh value > > > > Hi all, > > > > When 3 dupacks are received and TCP enter fast recovery, if SACK is use= d, the CWND is set to maxseg: > > > > 2593 if (tp->t_flags & TF_SACK_PERMIT) { > > 2594 TCPSTAT_INC( > > 2595 tcps_sack_recovery_episode); > > 2596 tp->snd_recover =3D tp->snd_nxt; > > 2597 tp->snd_cwnd =3D maxseg; > > 2598 (void) tp->t_fb->tfb_tcp_output(tp); > > 2599 goto drop; > > 2600 } > > > > Otherwise(SACK is not in use), CWND is set to maxseg before > > tcp_output() and then set back to snd_ssthresh+inflation > > 2601 tp->snd_nxt =3D th->th_ack; > > 2602 tp->snd_cwnd =3D maxseg; > > 2603 (void) tp->t_fb->tfb_tcp_output(tp); > > 2604 KASSERT(tp->snd_limited <=3D 2, > > 2605 ("%s: tp->snd_limited too big", > > 2606 __func__)); > > 2607 tp->snd_cwnd =3D tp->snd_ssthresh + > > 2608 maxseg * > > 2609 (tp->t_dupacks - tp->snd_limited); > > 2610 if (SEQ_GT(onxt, tp->snd_nxt)) > > 2611 tp->snd_nxt =3D onxt; > > 2612 goto drop; > > > > I'm wondering in the SACK case, should CWND be set back to ssthresh(whi= ch has been slashed in cc_cong_signal() a few lines above) before line 2599= , like non-SACK case, instead of doing slow start from maxseg? > > I read rfc6675 and a few others, and it looks like that's the case. I a= ppreciate your opinion, again. > > > > Thanks, > > Liang > > _______________________________________________ > > freebsd-net@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-transport@freebsd.org mailing list https://lists.freebsd.org/mail= man/listinfo/freebsd-transport > To unsubscribe, send any mail to "freebsd-transport-unsubscribe@freebsd.o= rg"