From owner-freebsd-transport@freebsd.org Wed Jun 17 00:05:05 2020 Return-Path: Delivered-To: freebsd-transport@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 9B88933A6C2 for ; Wed, 17 Jun 2020 00:05:05 +0000 (UTC) (envelope-from cpaasch@apple.com) Received: from ma1-aaemail-dr-lapp01.apple.com (ma1-aaemail-dr-lapp01.apple.com [17.171.2.60]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "nwk-aaemail-lapp01.apple.com", Issuer "Apple IST CA 2 - G1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49mlgh2zhsz3ytb; Wed, 17 Jun 2020 00:05:04 +0000 (UTC) (envelope-from cpaasch@apple.com) Received: from pps.filterd (ma1-aaemail-dr-lapp01.apple.com [127.0.0.1]) by ma1-aaemail-dr-lapp01.apple.com (8.16.0.42/8.16.0.42) with SMTP id 05GNt4dj008092; Tue, 16 Jun 2020 17:05:02 -0700 Received: from rn-mailsvcp-mta-lapp03.rno.apple.com (rn-mailsvcp-mta-lapp03.rno.apple.com [10.225.203.151]) by ma1-aaemail-dr-lapp01.apple.com with ESMTP id 31q64u9x0e-16 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 16 Jun 2020 17:05:02 -0700 Received: from rn-mailsvcp-mmp-lapp01.rno.apple.com (rn-mailsvcp-mmp-lapp01.rno.apple.com [17.179.253.14]) by rn-mailsvcp-mta-lapp03.rno.apple.com (Oracle Communications Messaging Server 8.1.0.5.20200312 64bit (built Mar 12 2020)) with ESMTPS id <0QC100JD1MWB7DD0@rn-mailsvcp-mta-lapp03.rno.apple.com>; Tue, 16 Jun 2020 17:04:59 -0700 (PDT) Received: from process_milters-daemon.rn-mailsvcp-mmp-lapp01.rno.apple.com by rn-mailsvcp-mmp-lapp01.rno.apple.com (Oracle Communications Messaging Server 8.1.0.5.20200312 64bit (built Mar 12 2020)) id <0QC100300MO8Q400@rn-mailsvcp-mmp-lapp01.rno.apple.com>; Tue, 16 Jun 2020 17:04:59 -0700 (PDT) X-Va-A: X-Va-T-CD: b3ed46026a97fce9f63be69e26d1ba3d X-Va-E-CD: 778877b96a2c9a9a21e947aaf1e79b22 X-Va-R-CD: 723cb0d3f3d699ef8dfb600730346c25 X-Va-CD: 0 X-Va-ID: 55737973-4501-4d07-be36-c4742bfa10db X-V-A: X-V-T-CD: b3ed46026a97fce9f63be69e26d1ba3d X-V-E-CD: 778877b96a2c9a9a21e947aaf1e79b22 X-V-R-CD: 723cb0d3f3d699ef8dfb600730346c25 X-V-CD: 0 X-V-ID: 9deabedc-9bef-4e0b-8c3d-509e1e01c29a X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.687 definitions=2020-06-16_13:2020-06-16, 2020-06-16 signatures=0 Received: from localhost ([17.232.188.131]) by rn-mailsvcp-mmp-lapp01.rno.apple.com (Oracle Communications Messaging Server 8.1.0.5.20200312 64bit (built Mar 12 2020)) with ESMTPSA id <0QC1003SPMWARM10@rn-mailsvcp-mmp-lapp01.rno.apple.com>; Tue, 16 Jun 2020 17:04:59 -0700 (PDT) Date: Tue, 16 Jun 2020 17:04:58 -0700 From: Christoph Paasch To: "Scheffenegger, Richard" Cc: "Cui, Cheng" , Vidhi Goel , 'Michael Tuexen' , freebsd-transport@freebsd.org Subject: Re: SACK + RTO interaction Message-id: <20200617000458.GK3048@MacBook-Pro-64.local> References: MIME-version: 1.0 Content-type: text/plain; charset=us-ascii Content-disposition: inline In-reply-to: User-Agent: Mutt/1.12.2 (2019-09-21) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216, 18.0.687 definitions=2020-06-16_13:2020-06-16, 2020-06-16 signatures=0 X-Rspamd-Queue-Id: 49mlgh2zhsz3ytb X-Spamd-Bar: ------- X-Spamd-Result: default: False [-7.96 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; RCVD_COUNT_FIVE(0.00)[6]; R_DKIM_ALLOW(-0.20)[apple.com:s=20180706]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:17.171.2.60]; NEURAL_HAM_LONG(-1.01)[-1.011]; MIME_GOOD(-0.10)[text/plain]; URIBL_BLOCKED(0.00)[netapp.com:email,ts.la:url]; NEURAL_HAM_MEDIUM(-0.91)[-0.907]; RCPT_COUNT_FIVE(0.00)[5]; DWL_DNSWL_LOW(-1.00)[apple.com:dkim]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[apple.com:+]; DMARC_POLICY_ALLOW(-0.50)[apple.com,quarantine]; RCVD_IN_DNSWL_NONE(0.00)[17.171.2.60:from]; NEURAL_HAM_SHORT(-1.04)[-1.042]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:714, ipnet:17.168.0.0/14, country:US]; RWL_MAILSPIKE_VERYGOOD(0.00)[17.171.2.60:from]; WHITELIST_SPF_DKIM(-3.00)[apple.com:d:+,apple.com:s:+] X-Mailman-Approved-At: Wed, 17 Jun 2020 08:10:55 +0000 X-BeenThere: freebsd-transport@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussions of transport level network protocols in FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Jun 2020 00:05:05 -0000 Hello, Admittedly, a rather late reply, but as you see my ToDo-list is actually effective and things don't fall off the cliff ;-) Thanks for bringing this up, Richard! I tried to reproduce the scenario you described and as far as I can see, we (darwin) keep on re-arming the timer. For two reasons: 1. if tcp_output won't transmit new data, it makes sure that the REXMT timer is armed https://github.com/apple/darwin-xnu/blob/master/bsd/netinet/tcp_output.c#L1395 2. in tcp_input, we are also making sure that the timer is armed: https://github.com/apple/darwin-xnu/blob/master/bsd/netinet/tcp_input.c#L4632 The packetdrill sequence I used to attempt a reproduction is: +0 > . 1:1001(1000) ack 1 +0 > . 1001:2001(1000) ack 1 +0 > . 2001:3001(1000) ack 1 +0 > . 3001:4001(1000) ack 1 +0 > . 4001:5001(1000) ack 1 +0 > . 5001:6001(1000) ack 1 +0 > . 6001:7001(1000) ack 1 +0 > . 7001:8001(1000) ack 1 +0 > . 8001:9001(1000) ack 1 +0 > P. 9001:10001(1000) ack 1 +0.01 < . 1:1(0) ack 1 win 4242 +0 > . 1:1001(1000) ack 1 +0 > . 1001:2001(1000) ack 1 +0 > . 2001:3001(1000) ack 1 +0 < . 1:1(0) ack 1001 win 4242 +0.23 > . 1001:2001(1000) ack 1 +0 < . 1:1(0) ack 2001 win 4242 +0 > . 2001:3001(1000) ack 1 The first ack 1001 will enter tcp_sack_partialack and set REXMT to 0, but the latter call to tcp_output will reset it to a valid value. Or did you had a different sequence of events in mind? Thanks, Christoph On 01/13/20 - 23:39, Scheffenegger, Richard wrote: > Hi guys, > > I believe, Cheng has uncovered another long lurking bug, this time in the interaction between RTO and SACK. > > Since its inception, tcp_sack_partialack stops the RTO timer apparently - which doesn't seem right. > > What he observed is that if you run twice during the same SACK loss recovery episode into a lost retransmission (which is currently only recoverable by RTO), the initial loss is recovered by the RTO (unless an partial ACK disabled the timer prior to it firing), and the 2nd twice lost segment is at the mercy of any other tcp timer which hopefully is still active (keepalive, persist, ...). > > https://reviews.freebsd.org/source/src/browse/head/sys/netinet/tcp_sack.c#782 > > I strongly suspect, that this should never cancelled the RTO, but reset it anew after a partial ACK. At least that would be more logical - to pull forward the timeout, if you are making some forward progress - not to stop the timeout completely, if one (of possibly many) retransmissions went through; if SACK loss recovery doesn't complete in an RTO timeout (which is many more RTTs than the single RTT a SACK loss recover should be taking), it would be prudent to give up and fall back to RTO, not? > > This effect may also explain some of the other sporadic, very lengthy SACK recoveries we couldn't really pin down so far... > > The patch should be easy enough > tcp_timer_activate(tp, TT_REXMT, tp->t_rxtcur); > instead of > tcp_timer_activate(tp, TT_REXMT, 0); > > Any comments? > > BTW, found the same in Darwin. > > > > Richard Scheffenegger > Consulting Solution Architect > NAS & Networking > > NetApp > +43 1 3676 811 3157 Direct Phone > +43 664 8866 1857 Mobile Phone > Richard.Scheffenegger@netapp.com > > > [Welcome to Data Driven] > > > [Facebook] [Twitter] > #DataDriven > > https://ts.la/richard49892 >