From owner-freebsd-transport@freebsd.org Tue Sep 15 20:22:19 2020 Return-Path: Delivered-To: freebsd-transport@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id BBEE13DF37A for ; Tue, 15 Sep 2020 20:22:19 +0000 (UTC) (envelope-from l.tian.email@gmail.com) Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4BrZQf68zNz4bST for ; Tue, 15 Sep 2020 20:22:18 +0000 (UTC) (envelope-from l.tian.email@gmail.com) Received: by mail-ed1-x532.google.com with SMTP id ay8so4314375edb.8 for ; Tue, 15 Sep 2020 13:22:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=1QboG6c2Dj/HvQj2B0KQShJu/JK1ioeqe71uJJlkve4=; b=owGUjBG52AjOQ0kInvEBtO6oH37KXaF/D67MMJ1Ea9V3QRYb0qD3/8NKXsk+1M+MdO oZNoHMuqpLzpISi9OmZpBJv3VayGqig1FH82IxbwbLBwo1bUxtTFzm+9sB5BbaOn7Wmv FGr+x2716R1qfQxa0/kxiw2///vnmDkip8lfifWxxFo6PeX+adFkIuXZ3/vyV21Ij16q a2tjtDT2R1WxfUQ69IN4FytNnUjdqv7htPG0rSk/nI6Jg/wdf8np4qMSkK0OC1eh8tjr UqePPJQb0b2+WNHB2kQHweJvVY69sgyiZBetO2ImlQkvupKHxPIAM6mxuHit2XNGdc+x 3UHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=1QboG6c2Dj/HvQj2B0KQShJu/JK1ioeqe71uJJlkve4=; b=OjXgWorkLeHMjDM+VuRWXEFg6RAYa62JliTn6eK/FUaBUsmcwGwiPhWd6GQa0Z+iB7 g2v+2VdLceCPDIPO77Yvkb+ypjK+SJrOu1h2az+3qzEwHCDu7UtMX2PpDjvbRcC9HhjA 8S72DppamIcB7I0ygjH5Q41+q7TQ8wJo99xvwULwmlDQ16yE5k59nvrou4VjCG/zdw0W R6dqPqFTT4cfXwoyLEibXvMYGP84N3jo4aL8aq+ZyA0iNm9FYW6hP8oe8sbjYilOJW+n 2wo0KWsbWkEvE2AmHFh1m/UXlGY05FEoC1LvJxZFO/tK8ZbvhVAx6ozh2TkUvsfQtu2m 0eDQ== X-Gm-Message-State: AOAM530C2ykEGo7HpwApr9EsxKRDX2Bpoi0O60B5yfVy2cCwhwqZKeMM vvXRuDZuTEa9OhcR3kLs2ZxinEW3rpB9o0yKLgtbRpdVs+Q= X-Google-Smtp-Source: ABdhPJwvqO91Tw+p4f4yWBAwmo2QtzuV/jO3GutTrln/R2HyY7sYoyFEpBNaACWbCyDxoZUyBE7J3LTGWv7RXBJhptU= X-Received: by 2002:aa7:c154:: with SMTP id r20mr24644396edp.337.1600201336648; Tue, 15 Sep 2020 13:22:16 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Liang Tian Date: Tue, 15 Sep 2020 16:22:04 -0400 Message-ID: Subject: Re: Fast recovery ssthresh value To: "Scheffenegger, Richard" Cc: FreeBSD Transport Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4BrZQf68zNz4bST X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=owGUjBG5; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of ltianemail@gmail.com designates 2a00:1450:4864:20::532 as permitted sender) smtp.mailfrom=ltianemail@gmail.com X-Spamd-Result: default: False [-3.05 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; RCVD_TLS_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36:c]; FREEMAIL_FROM(0.00)[gmail.com]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-transport@freebsd.org]; NEURAL_HAM_LONG(-1.00)[-1.000]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::532:from]; NEURAL_HAM_SHORT(-0.05)[-0.054]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; TAGGED_FROM(0.00)[]; MAILMAN_DEST(0.00)[freebsd-transport]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-transport@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussions of transport level network protocols in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Sep 2020 20:22:19 -0000 Hi Richard, Thanks. It works well now. We also had an observation that the majority of tinygrams were caused by the t_maxseg usage in tcp_sack_partialack(). Now with PRR we rarely see tinygrams because 1) tcp_sack_partialack() is no longer called 2) the updated PRR patch Just one comment on the patch: line 2546 is redundant because maxseg is already defined and calculated in line 2477 Regards, Liang On Tue, Sep 15, 2020 at 10:35 AM Scheffenegger, Richard wrote: > > Hi Liang, > > I was about to send out this email notifying you of the changes to the pa= tch, where you uncovered the issues with TSopt enabled TCP flows. > > https://reviews.freebsd.org/D18892 > > Can you please re-patch your test machine with this updated version (I fi= xed one merge issue due to whitespace cleanup recently too, so it should ap= ply cleanly to HEAD now). > > Please let us know and share any comments and criticism about this patch! > > Thanks again for testing - and finding the overlooked combination with ti= mestamps. > > > Richard Scheffenegger > > -----Original Message----- > From: Liang Tian > Sent: Freitag, 11. September 2020 19:02 > To: Scheffenegger, Richard > Cc: FreeBSD Transport > Subject: Re: Fast recovery ssthresh value > > NetApp Security WARNING: This is an external email. Do not click links or= open attachments unless you recognize the sender and know the content is s= afe. > > > > > Hi Richard, > > Initial tests show PRR is doing quite well. See trace below showing respo= nse to TSval 2713381916 and 2713381917. > I have a comment on the patch: I think all the tp->t_maxseg should be rep= laced with maxseg in the diff (https://reviews.freebsd.org/D18892), > where maxseg =3D tcp_maxseg(tp). This will take TCP options(timestamp in = this case) into account and avoid sending the tinygrams with len 120 and 36= in the trace below. > Interestingly we were also chasing another issue where we see a lot of > 12 bytes segments when retransmission happens(before applying PRT patch),= we are suspecting the mixed usage of t_maxseg and maxseg =3D > tcp_maxseg(tp) in the tcp code is causing this: the CCA modules are all u= sing t_maxseg for CWND increase instead of effective SMSS. > > [TCP Dup ACK 41541#3] 52466 > 80 [ACK] Seq=3D156 Ack=3D44596441 > Win=3D3144704 Len=3D0 TSval=3D2713381914 TSecr=3D1636604730 SLE=3D4678531= 7 > SRE=3D46790869 > [TCP Dup ACK 41541#4] 52466 > 80 [ACK] Seq=3D156 Ack=3D44596441 > Win=3D3144704 Len=3D0 TSval=3D2713381916 TSecr=3D1636604730 SLE=3D4678531= 7 > SRE=3D46804749 > [TCP Dup ACK 41541#5] 52466 > 80 [ACK] Seq=3D156 Ack=3D44596441 > Win=3D3144704 Len=3D0 TSval=3D2713381917 TSecr=3D1636604730 SLE=3D4678531= 7 > SRE=3D46808913 > [TCP Out-Of-Order] 80 > 52466 [ACK] Seq=3D44597853 Ack=3D156 Win=3D1048= 576 > Len=3D1388 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 >= 52466 [ACK] Seq=3D44599241 Ack=3D156 Win=3D1048576 > Len=3D1388 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 >= 52466 [ACK] Seq=3D44600629 Ack=3D156 Win=3D1048576 > Len=3D1388 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 >= 52466 [ACK] Seq=3D44602017 Ack=3D156 Win=3D1048576 > Len=3D1388 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 >= 52466 [ACK] Seq=3D44603405 Ack=3D156 Win=3D1048576 > Len=3D1388 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 >= 52466 [ACK] Seq=3D44604793 Ack=3D156 Win=3D1048576 > Len=3D1388 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 >= 52466 [ACK] Seq=3D44606181 Ack=3D156 Win=3D1048576 > Len=3D1388 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 >= 52466 [ACK] Seq=3D44607569 Ack=3D156 Win=3D1048576 > Len=3D1388 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 >= 52466 [ACK] Seq=3D44608957 Ack=3D156 Win=3D1048576 > Len=3D1388 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 >= 52466 [ACK] Seq=3D44610345 Ack=3D156 Win=3D1048576 > Len=3D1388 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 >= 52466 [ACK] Seq=3D44611733 Ack=3D156 Win=3D1048576 > Len=3D120 TSval=3D1636604904 TSecr=3D2713381916 [TCP Out-Of-Order] 80 > = 52466 [ACK] Seq=3D44611853 Ack=3D156 Win=3D1048576 > Len=3D1388 TSval=3D1636604905 TSecr=3D2713381917 [TCP Out-Of-Order] 80 >= 52466 [ACK] Seq=3D44613241 Ack=3D156 Win=3D1048576 > Len=3D1388 TSval=3D1636604905 TSecr=3D2713381917 [TCP Out-Of-Order] 80 >= 52466 [ACK] Seq=3D44614629 Ack=3D156 Win=3D1048576 > Len=3D1388 TSval=3D1636604905 TSecr=3D2713381917 [TCP Out-Of-Order] 80 >= 52466 [ACK] Seq=3D44616017 Ack=3D156 Win=3D1048576 > Len=3D36 TSval=3D1636604905 TSecr=3D2713381917 [TCP Dup ACK 41541#6] 5246= 6 > 80 [ACK] Seq=3D156 Ack=3D44596441 > Win=3D3144704 Len=3D0 TSval=3D2713381925 TSecr=3D1636604730 SLE=3D4678531= 7 > SRE=3D46867209 > [TCP Out-Of-Order] 80 > 52466 [ACK] Seq=3D44616053 Ack=3D156 Win=3D1048= 576 > Len=3D1388 TSval=3D1636604912 TSecr=3D2713381925 [TCP Out-Of-Order] 80 >= 52466 [ACK] Seq=3D44617441 Ack=3D156 Win=3D1048576 > Len=3D1388 TSval=3D1636604912 TSecr=3D2713381925 > > Thanks, > Liang > ... > > On Fri, Sep 11, 2020 at 3:40 AM Scheffenegger, Richard wrote: > > > > Perfect! > > > > Please share your findings then, as reviews (including informal ones) a= re needed prior to me committing this patch. > > > > Note that it builds upon D18624, which is currently in stable/12 and he= ad, but not any released branches. So you may need to apply that too if you= aren't using head. > > > > Best regards, > > > > > > Richard Scheffenegger > > > > -----Original Message----- > > From: Liang Tian > > Sent: Freitag, 11. September 2020 06:06 > > To: Scheffenegger, Richard ; FreeBSD > > Transport > > Subject: Re: Fast recovery ssthresh value > > > > NetApp Security WARNING: This is an external email. Do not click links = or open attachments unless you recognize the sender and know the content is= safe. > > > > > > > > > > Hi Richard, > > > > Thanks! I'm able to apply the patches. I'll test it. > > > > Regards, > > Liang > > > > > > > > On Thu, Sep 10, 2020 at 5:49 AM Scheffenegger, Richard wrote: > > > > > > Hi Liang, > > > > > > Yes, you are absolutely correct about this observation. The SACK loss= recovery will only send one MSS per received ACK right now - and when the= re is ACK thinning present, will fail to timely recover all the missing pac= kets, eventually receiving no more ACK to clock out more retransmissions... > > > > > > I have a Diff in review, to implement Proportional Rate Reduction: > > > > > > https://reviews.freebsd.org/D18892 > > > > > > Which should address not only that issue about ACK thinning, but also= the issue that current SACK loss recovery has to wait until pipe drops bel= ow ssthresh, before the retransmissions are clocked out. And then, they wou= ld actually be clocked out at the same rate at the incoming ACKs. This woul= d be the same rate as when the overload happened (barring any ACK thinning)= , and as a secondary effect, it was observed that this behavior too can lea= d to self-inflicted loss - of retransmissions. > > > > > > If you have the ability to patch your kernel with D18892 and observe = how the reaction is in your dramatic ACK thinning scenario, that would be g= ood to know! The assumption of the Patch was, that - as per TCP RFC require= ments - there is one ACK for each received out-of-sequence data segment, an= d ACK drops / thinning are not happening on such a massive scale as you des= cribe it. > > > > > > Best regards, > > > > > > Richard Scheffenegger > > > > > > -----Original Message----- > > > From: owner-freebsd-transport@freebsd.org > > > On Behalf Of Liang Tian > > > Sent: Mittwoch, 9. September 2020 19:16 > > > To: Scheffenegger, Richard > > > Cc: FreeBSD Transport > > > Subject: Re: Fast recovery ssthresh value > > > > > > Hi Richard, > > > > > > Thanks for the explanation and sorry for the late reply. > > > I've been investigating SACK loss recovery and I think I'm seeing an > > > issue similar to the ABC L value issue that I reported > > > previously(https://reviews.freebsd.org/D26120) and I do believe there= is a deviation to RFC3517: > > > The issue happens when a DupAck is received during SACK loss recovery= in the presence of ACK Thinning or receiver enabling LRO, which means the = SACK block edges could expand by more than 1 SMSS(We've seen 30*SMSS), i.e.= a single DupAck could decrement `pipe` by more than 1 SMSS. > > > In RFC3517, > > > (C) If cwnd - pipe >=3D 1 SMSS, the sender SHOULD transmit one or mor= e segments... > > > (C.5) If cwnd - pipe >=3D 1 SMSS, return to (C.1) So based on= RFC, the sender should be able to send more segments if such DupAck is rec= eived, because of the big change to `pipe`. > > > > > > In the current implementation, the cwin variable, which controls the = amount of data that can be transmitted based on the new information, is dic= tated by snd_cwnd. The snd_cwnd is incremented by 1 SMSS for each DupAck re= ceived. I believe this effectively limits the retransmission triggered by e= ach DupAck to 1 SMSS - deviation. > > > 307 cwin =3D > > > 308 imax(min(tp->snd_wnd, tp->snd_cwnd) - sack_bytes_rxm= t, 0); > > > > > > As a result, SACK is not doing enough recovery in this scenario and l= oss has to be recovered by RTO. > > > Again, I'd appreciate feedback from the community. > > > > > > Regards, > > > Liang Tian > > > > > > > > > > > > > > > On Sun, Aug 23, 2020 at 3:56 PM Scheffenegger, Richard wrote: > > > > > > > > Hi Liang, > > > > > > > > In SACK loss recovery, you can recover up to ssthresh (prior cwnd/2= [or 70% in case of cubic]) lost bytes - at least in theory. > > > > > > > > In comparison, (New)Reno can only recover one lost packet per windo= w, and then keeps on transmitting new segments (ack + cwnd), even before th= e receipt of the retransmitted packet is acked. > > > > > > > > For historic reasons, the semantic of the variable cwnd is overload= ed during loss recovery, and it doesn't "really" indicate cwnd, but rather = indicates if/when retransmissions can happen. > > > > > > > > > > > > In both cases (also the simple one, with only one packet loss), cwn= d should be equal (or near equal) to ssthresh by the time loss recovery is = finished - but NOT before! While it may appear like slow-start, the value o= f the cwnd variable really increases by acked_bytes only per ACK (not acked= _bytes + SMSS), since the left edge (snd_una) doesn't move right - unlike d= uring slow-start. But numerically, these different phases (slow-start / sac= k loss-recovery) may appear very similar. > > > > > > > > You could check this using the (loadable) SIFTR module, which captu= res t_flags (indicating if cong/loss recovery is active), ssthresh, cwnd, a= nd other parameters. > > > > > > > > That is at least how things are supposed to work; or have you inves= tigated the timing and behavior of SACK loss recovery and found a deviation= to RFC3517? Note that FBSD currently has not fully implemented RFC6675 sup= port (which deviates slightly from 3517 under specific circumstances; I hav= e a patch pending to implemente 6675 rescue retransmissions, but haven't tw= eaked the other aspects of 6675 vs. 3517. > > > > > > > > BTW: While freebsd-net is not the wrong DL per se, TCP, UDP, SCTP s= pecific questions can also be posted to freebsd-transport, which is more na= rrowly focused. > > > > > > > > Best regards, > > > > > > > > Richard Scheffenegger > > > > > > > > -----Original Message----- > > > > From: owner-freebsd-net@freebsd.org > > > > On Behalf Of Liang Tian > > > > Sent: Sonntag, 23. August 2020 00:14 > > > > To: freebsd-net > > > > Subject: Fast recovery ssthresh value > > > > > > > > Hi all, > > > > > > > > When 3 dupacks are received and TCP enter fast recovery, if SACK is= used, the CWND is set to maxseg: > > > > > > > > 2593 if (tp->t_flags & TF_SACK_PERMIT) { > > > > 2594 TCPSTAT_INC( > > > > 2595 tcps_sack_recovery_episode); > > > > 2596 tp->snd_recover =3D tp->snd_nxt; > > > > 2597 tp->snd_cwnd =3D maxseg; > > > > 2598 (void) tp->t_fb->tfb_tcp_output(tp); > > > > 2599 goto drop; > > > > 2600 } > > > > > > > > Otherwise(SACK is not in use), CWND is set to maxseg before > > > > tcp_output() and then set back to snd_ssthresh+inflation > > > > 2601 tp->snd_nxt =3D th->th_ack; > > > > 2602 tp->snd_cwnd =3D maxseg; > > > > 2603 (void) tp->t_fb->tfb_tcp_output(tp); > > > > 2604 KASSERT(tp->snd_limited <=3D 2, > > > > 2605 ("%s: tp->snd_limited too big", > > > > 2606 __func__)); > > > > 2607 tp->snd_cwnd =3D tp->snd_ssthresh + > > > > 2608 maxseg * > > > > 2609 (tp->t_dupacks - tp->snd_limited); > > > > 2610 if (SEQ_GT(onxt, tp->snd_nxt)) > > > > 2611 tp->snd_nxt =3D onxt; > > > > 2612 goto drop; > > > > > > > > I'm wondering in the SACK case, should CWND be set back to ssthresh= (which has been slashed in cc_cong_signal() a few lines above) before line = 2599, like non-SACK case, instead of doing slow start from maxseg? > > > > I read rfc6675 and a few others, and it looks like that's the case.= I appreciate your opinion, again. > > > > > > > > Thanks, > > > > Liang > > > > _______________________________________________ > > > > freebsd-net@freebsd.org mailing list > > > > https://lists.freebsd.org/mailman/listinfo/freebsd-net > > > > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.o= rg" > > > _______________________________________________ > > > freebsd-transport@freebsd.org mailing list > > > https://lists.freebsd.org/mailman/listinfo/freebsd-transport > > > To unsubscribe, send any mail to "freebsd-transport-unsubscribe@freeb= sd.org"