From owner-freebsd-transport@freebsd.org Fri Sep 11 17:02:43 2020 Return-Path: Delivered-To: freebsd-transport@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 927CF3E0A08 for ; Fri, 11 Sep 2020 17:02:43 +0000 (UTC) (envelope-from l.tian.email@gmail.com) Received: from mail-ej1-x634.google.com (mail-ej1-x634.google.com [IPv6:2a00:1450:4864:20::634]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Bp2BB6ZRXz4bH3 for ; Fri, 11 Sep 2020 17:02:42 +0000 (UTC) (envelope-from l.tian.email@gmail.com) Received: by mail-ej1-x634.google.com with SMTP id u21so14724815eja.2 for ; Fri, 11 Sep 2020 10:02:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=38a/8jJuKOp/MSB5hAqRwFLgCI7AV0a5SYpP5INySNk=; b=Ig0OXZ+iBepFYS5pYfD0PTQE9TAh60rxgs4gYZjcJtJaslaBJLhk6wm15cSR8m676a VxO1EnHt5v3S2eQpZc5sbDBk7ETgy8lhSum8dtgTpNRjKPOGXL+OrK4wKPrktXN6XOoM hxNn4HONxPzYc47mNWaLVufVC5laITLPa9Ety/qGY4YWRtu0i94vlqqdIRWY2BlLmoMS z70s0NiLg0Nvkb4cPnO1md4lVYeea4DxSPdRy5KfIyycSbPUOdujVtBrr0A999ro0yVe cCARSG00csvilplJSpARYlHiLI5wOFr0H8l/Erz/6DQwK/j/LVfOUAqcG2jlZnnsypV1 crRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=38a/8jJuKOp/MSB5hAqRwFLgCI7AV0a5SYpP5INySNk=; b=XgI2dGQroD6fZMNuUjHzLu9Fg5Mv1lXmptJKyVZPjKvAcOt/tyLOlShpuLovvwL1AS PUHaU/YicfTRZ7QeYVspjkjz9ptN9wsrHj38OLRESNk1Fwyv24YZkwL0QA93sMm3DVGs 4OUTYPJevJpqSTM0MyEI0DayPt9bVrmLsAEN87PoxdQAgKofA78gUP9QB2NpvS4PHrpk HorWJURbd/ZVg9FISy4zvFMRCzqQWbEWRnyCkRWBY4lOn2qP0J5fjE3d2vT9YDhLLNEF OGtLnsMl2Y7ZZ4zCDDY+XCfTaYvw3pkD1UR72g4Il9Fss+qLuEvYyD36jV+ikH1TcTeL mwmw== X-Gm-Message-State: AOAM530vFNerMNT0221b+8St+55PhNeo6yEqpT61HjodRnSx95kIO6L7 IMU+/C2yK4D5RB7IynRKZ9hWoupWrM4K2VfcVxABP9JkQzk= X-Google-Smtp-Source: ABdhPJyaS+Hp8e5mFm+1D1T6eeFGnzE2iYh4KNl3PjPRaoso0Zc13YgmEP1F4zbXghbsVA/5BerMrhJn7jgM6OVlA50= X-Received: by 2002:a17:906:1b04:: with SMTP id o4mr3112353ejg.332.1599843761130; Fri, 11 Sep 2020 10:02:41 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Liang Tian Date: Fri, 11 Sep 2020 13:02:29 -0400 Message-ID: Subject: Re: Fast recovery ssthresh value To: "Scheffenegger, Richard" Cc: FreeBSD Transport Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4Bp2BB6ZRXz4bH3 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=Ig0OXZ+i; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of ltianemail@gmail.com designates 2a00:1450:4864:20::634 as permitted sender) smtp.mailfrom=ltianemail@gmail.com X-Spamd-Result: default: False [-3.54 / 15.00]; RCVD_TLS_ALL(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; NEURAL_HAM_MEDIUM(-0.97)[-0.973]; FROM_HAS_DN(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; FREEMAIL_FROM(0.00)[gmail.com]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-transport@freebsd.org]; NEURAL_HAM_LONG(-0.97)[-0.969]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::634:from]; NEURAL_HAM_SHORT(-0.60)[-0.596]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; TAGGED_FROM(0.00)[]; MAILMAN_DEST(0.00)[freebsd-transport]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-transport@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussions of transport level network protocols in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Sep 2020 17:02:43 -0000 Hi Richard, Initial tests show PRR is doing quite well. See trace below showing response to TSval 2713381916 and 2713381917. I have a comment on the patch: I think all the tp->t_maxseg should be replaced with maxseg in the diff (https://reviews.freebsd.org/D18892), where maxseg =3D tcp_maxseg(tp). This will take TCP options(timestamp in this case) into account and avoid sending the tinygrams with len 120 and 36 in the trace below. Interestingly we were also chasing another issue where we see a lot of 12 bytes segments when retransmission happens(before applying PRT patch), we are suspecting the mixed usage of t_maxseg and maxseg =3D tcp_maxseg(tp) in the tcp code is causing this: the CCA modules are all using t_maxseg for CWND increase instead of effective SMSS. [TCP Dup ACK 41541#3] 52466 > 80 [ACK] Seq=3D156 Ack=3D44596441 Win=3D3144704 Len=3D0 TSval=3D2713381914 TSecr=3D1636604730 SLE=3D46785317 SRE=3D46790869 [TCP Dup ACK 41541#4] 52466 > 80 [ACK] Seq=3D156 Ack=3D44596441 Win=3D3144704 Len=3D0 TSval=3D2713381916 TSecr=3D1636604730 SLE=3D46785317 SRE=3D46804749 [TCP Dup ACK 41541#5] 52466 > 80 [ACK] Seq=3D156 Ack=3D44596441 Win=3D3144704 Len=3D0 TSval=3D2713381917 TSecr=3D1636604730 SLE=3D46785317 SRE=3D46808913 [TCP Out-Of-Order] 80 > 52466 [ACK] Seq=3D44597853 Ack=3D156 Win=3D104857= 6 Len=3D1388 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 > 52466 [ACK] Seq=3D44599241 Ack=3D156 Win=3D104857= 6 Len=3D1388 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 > 52466 [ACK] Seq=3D44600629 Ack=3D156 Win=3D104857= 6 Len=3D1388 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 > 52466 [ACK] Seq=3D44602017 Ack=3D156 Win=3D104857= 6 Len=3D1388 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 > 52466 [ACK] Seq=3D44603405 Ack=3D156 Win=3D104857= 6 Len=3D1388 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 > 52466 [ACK] Seq=3D44604793 Ack=3D156 Win=3D104857= 6 Len=3D1388 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 > 52466 [ACK] Seq=3D44606181 Ack=3D156 Win=3D104857= 6 Len=3D1388 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 > 52466 [ACK] Seq=3D44607569 Ack=3D156 Win=3D104857= 6 Len=3D1388 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 > 52466 [ACK] Seq=3D44608957 Ack=3D156 Win=3D104857= 6 Len=3D1388 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 > 52466 [ACK] Seq=3D44610345 Ack=3D156 Win=3D104857= 6 Len=3D1388 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 > 52466 [ACK] Seq=3D44611733 Ack=3D156 Win=3D104857= 6 Len=3D120 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 > 52466 [ACK] Seq=3D44611853 Ack=3D156 Win=3D104857= 6 Len=3D1388 TSval=3D1636604905 TSecr=3D2713381917 [TCP Out-Of-Order] 80 > 52466 [ACK] Seq=3D44613241 Ack=3D156 Win=3D104857= 6 Len=3D1388 TSval=3D1636604905 TSecr=3D2713381917 [TCP Out-Of-Order] 80 > 52466 [ACK] Seq=3D44614629 Ack=3D156 Win=3D104857= 6 Len=3D1388 TSval=3D1636604905 TSecr=3D2713381917 [TCP Out-Of-Order] 80 > 52466 [ACK] Seq=3D44616017 Ack=3D156 Win=3D104857= 6 Len=3D36 TSval=3D1636604905 TSecr=3D2713381917 [TCP Dup ACK 41541#6] 52466 > 80 [ACK] Seq=3D156 Ack=3D44596441 Win=3D3144704 Len=3D0 TSval=3D2713381925 TSecr=3D1636604730 SLE=3D46785317 SRE=3D46867209 [TCP Out-Of-Order] 80 > 52466 [ACK] Seq=3D44616053 Ack=3D156 Win=3D104857= 6 Len=3D1388 TSval=3D1636604912 TSecr=3D2713381925 [TCP Out-Of-Order] 80 > 52466 [ACK] Seq=3D44617441 Ack=3D156 Win=3D104857= 6 Len=3D1388 TSval=3D1636604912 TSecr=3D2713381925 Thanks, Liang ... On Fri, Sep 11, 2020 at 3:40 AM Scheffenegger, Richard wrote: > > Perfect! > > Please share your findings then, as reviews (including informal ones) are= needed prior to me committing this patch. > > Note that it builds upon D18624, which is currently in stable/12 and head= , but not any released branches. So you may need to apply that too if you a= ren't using head. > > Best regards, > > > Richard Scheffenegger > > -----Original Message----- > From: Liang Tian > Sent: Freitag, 11. September 2020 06:06 > To: Scheffenegger, Richard ; FreeBSD Tr= ansport > Subject: Re: Fast recovery ssthresh value > > NetApp Security WARNING: This is an external email. Do not click links or= open attachments unless you recognize the sender and know the content is s= afe. > > > > > Hi Richard, > > Thanks! I'm able to apply the patches. I'll test it. > > Regards, > Liang > > > > On Thu, Sep 10, 2020 at 5:49 AM Scheffenegger, Richard wrote: > > > > Hi Liang, > > > > Yes, you are absolutely correct about this observation. The SACK loss r= ecovery will only send one MSS per received ACK right now - and when there= is ACK thinning present, will fail to timely recover all the missing packe= ts, eventually receiving no more ACK to clock out more retransmissions... > > > > I have a Diff in review, to implement Proportional Rate Reduction: > > > > https://reviews.freebsd.org/D18892 > > > > Which should address not only that issue about ACK thinning, but also t= he issue that current SACK loss recovery has to wait until pipe drops below= ssthresh, before the retransmissions are clocked out. And then, they would= actually be clocked out at the same rate at the incoming ACKs. This would = be the same rate as when the overload happened (barring any ACK thinning), = and as a secondary effect, it was observed that this behavior too can lead = to self-inflicted loss - of retransmissions. > > > > If you have the ability to patch your kernel with D18892 and observe ho= w the reaction is in your dramatic ACK thinning scenario, that would be goo= d to know! The assumption of the Patch was, that - as per TCP RFC requireme= nts - there is one ACK for each received out-of-sequence data segment, and = ACK drops / thinning are not happening on such a massive scale as you descr= ibe it. > > > > Best regards, > > > > Richard Scheffenegger > > > > -----Original Message----- > > From: owner-freebsd-transport@freebsd.org > > On Behalf Of Liang Tian > > Sent: Mittwoch, 9. September 2020 19:16 > > To: Scheffenegger, Richard > > Cc: FreeBSD Transport > > Subject: Re: Fast recovery ssthresh value > > > > Hi Richard, > > > > Thanks for the explanation and sorry for the late reply. > > I've been investigating SACK loss recovery and I think I'm seeing an > > issue similar to the ABC L value issue that I reported > > previously(https://reviews.freebsd.org/D26120) and I do believe there i= s a deviation to RFC3517: > > The issue happens when a DupAck is received during SACK loss recovery i= n the presence of ACK Thinning or receiver enabling LRO, which means the SA= CK block edges could expand by more than 1 SMSS(We've seen 30*SMSS), i.e. a= single DupAck could decrement `pipe` by more than 1 SMSS. > > In RFC3517, > > (C) If cwnd - pipe >=3D 1 SMSS, the sender SHOULD transmit one or more = segments... > > (C.5) If cwnd - pipe >=3D 1 SMSS, return to (C.1) So based on R= FC, the sender should be able to send more segments if such DupAck is recei= ved, because of the big change to `pipe`. > > > > In the current implementation, the cwin variable, which controls the am= ount of data that can be transmitted based on the new information, is dicta= ted by snd_cwnd. The snd_cwnd is incremented by 1 SMSS for each DupAck rece= ived. I believe this effectively limits the retransmission triggered by eac= h DupAck to 1 SMSS - deviation. > > 307 cwin =3D > > 308 imax(min(tp->snd_wnd, tp->snd_cwnd) - sack_bytes_rxmt,= 0); > > > > As a result, SACK is not doing enough recovery in this scenario and los= s has to be recovered by RTO. > > Again, I'd appreciate feedback from the community. > > > > Regards, > > Liang Tian > > > > > > > > > > On Sun, Aug 23, 2020 at 3:56 PM Scheffenegger, Richard wrote: > > > > > > Hi Liang, > > > > > > In SACK loss recovery, you can recover up to ssthresh (prior cwnd/2 [= or 70% in case of cubic]) lost bytes - at least in theory. > > > > > > In comparison, (New)Reno can only recover one lost packet per window,= and then keeps on transmitting new segments (ack + cwnd), even before the = receipt of the retransmitted packet is acked. > > > > > > For historic reasons, the semantic of the variable cwnd is overloaded= during loss recovery, and it doesn't "really" indicate cwnd, but rather in= dicates if/when retransmissions can happen. > > > > > > > > > In both cases (also the simple one, with only one packet loss), cwnd = should be equal (or near equal) to ssthresh by the time loss recovery is fi= nished - but NOT before! While it may appear like slow-start, the value of = the cwnd variable really increases by acked_bytes only per ACK (not acked_b= ytes + SMSS), since the left edge (snd_una) doesn't move right - unlike dur= ing slow-start. But numerically, these different phases (slow-start / sack = loss-recovery) may appear very similar. > > > > > > You could check this using the (loadable) SIFTR module, which capture= s t_flags (indicating if cong/loss recovery is active), ssthresh, cwnd, and= other parameters. > > > > > > That is at least how things are supposed to work; or have you investi= gated the timing and behavior of SACK loss recovery and found a deviation t= o RFC3517? Note that FBSD currently has not fully implemented RFC6675 suppo= rt (which deviates slightly from 3517 under specific circumstances; I have = a patch pending to implemente 6675 rescue retransmissions, but haven't twea= ked the other aspects of 6675 vs. 3517. > > > > > > BTW: While freebsd-net is not the wrong DL per se, TCP, UDP, SCTP spe= cific questions can also be posted to freebsd-transport, which is more narr= owly focused. > > > > > > Best regards, > > > > > > Richard Scheffenegger > > > > > > -----Original Message----- > > > From: owner-freebsd-net@freebsd.org > > > On Behalf Of Liang Tian > > > Sent: Sonntag, 23. August 2020 00:14 > > > To: freebsd-net > > > Subject: Fast recovery ssthresh value > > > > > > Hi all, > > > > > > When 3 dupacks are received and TCP enter fast recovery, if SACK is u= sed, the CWND is set to maxseg: > > > > > > 2593 if (tp->t_flags & TF_SACK_PERMIT) { > > > 2594 TCPSTAT_INC( > > > 2595 tcps_sack_recovery_episode); > > > 2596 tp->snd_recover =3D tp->snd_nxt; > > > 2597 tp->snd_cwnd =3D maxseg; > > > 2598 (void) tp->t_fb->tfb_tcp_output(tp); > > > 2599 goto drop; > > > 2600 } > > > > > > Otherwise(SACK is not in use), CWND is set to maxseg before > > > tcp_output() and then set back to snd_ssthresh+inflation > > > 2601 tp->snd_nxt =3D th->th_ack; > > > 2602 tp->snd_cwnd =3D maxseg; > > > 2603 (void) tp->t_fb->tfb_tcp_output(tp); > > > 2604 KASSERT(tp->snd_limited <=3D 2, > > > 2605 ("%s: tp->snd_limited too big", > > > 2606 __func__)); > > > 2607 tp->snd_cwnd =3D tp->snd_ssthresh + > > > 2608 maxseg * > > > 2609 (tp->t_dupacks - tp->snd_limited); > > > 2610 if (SEQ_GT(onxt, tp->snd_nxt)) > > > 2611 tp->snd_nxt =3D onxt; > > > 2612 goto drop; > > > > > > I'm wondering in the SACK case, should CWND be set back to ssthresh(w= hich has been slashed in cc_cong_signal() a few lines above) before line 25= 99, like non-SACK case, instead of doing slow start from maxseg? > > > I read rfc6675 and a few others, and it looks like that's the case. I= appreciate your opinion, again. > > > > > > Thanks, > > > Liang > > > _______________________________________________ > > > freebsd-net@freebsd.org mailing list > > > https://lists.freebsd.org/mailman/listinfo/freebsd-net > > > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org= " > > _______________________________________________ > > freebsd-transport@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-transport > > To unsubscribe, send any mail to "freebsd-transport-unsubscribe@freebsd= .org"