From owner-freebsd-net@freebsd.org Sun Apr 11 12:30:15 2021 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 1EFF55CD355 for ; Sun, 11 Apr 2021 12:30:15 +0000 (UTC) (envelope-from tuexen@freebsd.org) Received: from drew.franken.de (mail-n.franken.de [193.175.24.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.franken.de", Issuer "Sectigo RSA Domain Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4FJB5y2Rh2z3Q3c for ; Sun, 11 Apr 2021 12:30:14 +0000 (UTC) (envelope-from tuexen@freebsd.org) Received: from [IPv6:2a02:8109:1140:c3d:1507:c609:f682:ea59] (unknown [IPv6:2a02:8109:1140:c3d:1507:c609:f682:ea59]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTPSA id 2B472702606F0; Sun, 11 Apr 2021 14:30:10 +0200 (CEST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\)) Subject: Re: NFS Mount Hangs From: tuexen@freebsd.org In-Reply-To: Date: Sun, 11 Apr 2021 14:30:09 +0200 Cc: "Scheffenegger, Richard" , Youssef GHORBAL , "freebsd-net@freebsd.org" Content-Transfer-Encoding: quoted-printable Message-Id: <23F49FD9-A8B6-460F-9CD2-BBC3181A058F@freebsd.org> References: <3750001D-3F1C-4D9A-A9D9-98BCA6CA65A4@tildenparkcapital.com> <33693DE3-7FF8-4FAB-9A75-75576B88A566@tildenparkcapital.com> <8E745920-1092-4312-B251-B49D11FE8028@pasteur.fr> <765CE1CD-6AAB-4BEF-97C6-C2A1F0FF4AC5@freebsd.org> <2B189169-C0C9-4DE6-A01A-BE916F10BABA@freebsd.org> <077ECE2B-A84C-440D-AAAB-00293C841F14@freebsd.org> <3980F368-098D-4EE4-B213-4113C2CAFE7D@freebsd.org> To: Rick Macklem X-Mailer: Apple Mail (2.3654.60.0.2.21) X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=disabled version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mail-n.franken.de X-Rspamd-Queue-Id: 4FJB5y2Rh2z3Q3c X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [0.00 / 15.00]; local_wl_from(0.00)[freebsd.org]; ASN(0.00)[asn:680, ipnet:193.174.0.0/15, country:DE] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Apr 2021 12:30:15 -0000 > On 10. Apr 2021, at 23:59, Rick Macklem wrote: >=20 > tuexen@freebsd.org wrote: >> Rick wrote: > [stuff snipped] >>>> With r367492 you don't get the upcall with the same error state? Or = you don't get an error on a write() call, when there should be one? >> If Send-Q is 0 when the network is partitioned, after healing, the = krpc sees no activity on >> the socket (until it acquires/processes an RPC it will not do a = sosend()). >> Without the 6minute timeout, the RST battle goes on "forever" (I've = never actually >> waited more than 30minutes, which is close enough to "forever" for = me). >> --> With the 6minute timeout, the "battle" stops after 6minutes, when = the timeout >> causes a soshutdown(..SHUT_WR) on the socket. >> (Since the soshutdown() patch is not yet in "main". I got = comments, but no "reviewed" >> on it, the 6minute timer won't help if enabled in main. The = soclose() won't happen >> for TCP connections with the back channel enabled, such as Linux = 4.1/4.2 ones.) >> I'm confused. So you are saying that if the Send-Q is empty when you = partition the >> network, and the peer starts to send SYNs after the healing, FreeBSD = responds >> with a challenge ACK which triggers the sending of a RST by Linux. = This RST is >> ignored multiple times. >> Is that true? Even with my patch for the the bug I introduced? > Yes and yes. > Go take another look at linuxtofreenfs.pcap > ("fetch https://people.freebsd.org/~rmacklem/linuxtofreenfs.pcap" if = you don't > already have it.) > Look at packet #1949->2069. I use wireshark, but you'll have your = favourite. > You'll see the "RST battle" that ends after > 6minutes at packet#2069. If there is no 6minute timeout enabled in the > server side krpc, then the battle just continues (I once let it run = for about > 30minutes before giving up). The 6minute timeout is not currently = enabled > in main, etc. Hmm. I don't understand why r367492 can impact the processing of the = RST, which basically destroys the TCP connection. Richard: Can you explain that? Best regards Michael >=20 >> What version of the kernel are you using? > "main" dated Dec. 23, 2020 + your bugfix + assorted NFS patches that > are not relevant + 2 small krpc related patches. > --> The two small krpc related patches enable the 6minute timeout and > add a soshutdown(..SHUT_WR) call when the 6minute timeout is > triggered. These have no effect until the 6minutes is up and, = without > them the "RTS battle" goes on forever. >=20 > Add to the above a revert of r367492 and the RST battle goes away and = things > behave as expected. The recovery happens quickly after the network is > unpartitioned, with either 0 or 1 RSTs. >=20 > rick > ps: Once the irrelevant NFS patches make it into "main", I will = upgrade to > main bits-de-jur for testing. >=20 > Best regards > Michael >>=20 >> If Send-Q is non-empty when the network is partitioned, the battle = will not happen. >>=20 >>>=20 >>> My understanding is that he needs this error indication when calling = shutdown(). >> There are several ways the krpc notices that a TCP connection is no = longer functional. >> - An error return like EPIPE from either sosend() or soreceive(). >> - A return of 0 from soreceive() with no data (normal EOF from other = end). >> - A 6minute timeout on the server end, when no activity has occurred = on the >> connection. This timer is currently disabled for NFSv4.1/4.2 mounts = in "main", >> but I enabled it for this testing, to stop the "RST battle goes on = forever" >> during testing. I am thinking of enabling it on "main", but this = crude bandaid >> shouldn't be thought of as a "fix for the RST battle". >>=20 >>>>=20 >>>> =46rom what you describe, this is on writes, isn't it? (I'm asking, = at the original problem that was fixed with r367492, occurs in the read = path (draining of ths so_rcv buffer in the upcall right away, which = subsequently influences the ACK sent by the stack). >>>>=20 >>>> I only added the so_snd buffer after some discussion, if the = WAKESOR shouldn't have a symmetric equivalent on WAKESOW.... >>>>=20 >>>> Thus a partial backout (leaving the WAKESOR part inside, but = reverting the WAKESOW part) would still fix my initial problem about = erraneous DSACKs (which can also lead to extremely poor performance with = Linux clients), but possible address this issue... >>>>=20 >>>> Can you perhaps take MAIN and apply = https://reviews.freebsd.org/D29690 for the revert only on the so_snd = upcall? >> Since the krpc only uses receive upcalls, I don't see how reverting = the send side would have >> any effect? >>=20 >>> Since the release of 13.0 is almost done, can we try to fix the = issue instead of reverting the commit? >> I think it has already shipped broken. >> I don't know if an errata is possible, or if it will be broken until = 13.1. >>=20 >> --> I am much more concerned with the otis@ stuck client problem than = this RST battle that only >> occurs after a network partitioning, especially if it is 13.0 = specific. >> I did this testing to try to reproduce Jason's stuck client = (with connection in CLOSE_WAIT) >> problem, which I failed to reproduce. >>=20 >> rick >>=20 >> Rs: agree, a good understanding where the interaction btwn stack, = socket and in kernel tcp user breaks is needed; >>=20 >>>=20 >>> If this doesn't help, some major surgery will be necessary to = prevent NFS sessions with SACK enabled, to transmit DSACKs... >>=20 >> My understanding is that the problem is related to getting a local = error indication after >> receiving a RST segment too late or not at all. >>=20 >> Rs: but the move of the upcall should not materially change that; i = don=E2=80=99t have a pc here to see if any upcall actually happens on = rst... >>=20 >> Best regards >> Michael >>>=20 >>>=20 >>>> I know from a printf that this happened, but whether it caused the = RST battle to not happen, I don't know. >>>>=20 >>>> I can put r367492 back in and do more testing if you'd like, but I = think it probably needs to be reverted? >>>=20 >>> Please, I don't quite understand why the exact timing of the upcall = would be that critical here... >>>=20 >>> A comparison of the soxxx calls and errors between the "good" and = the "bad" would be perfect. I don't know if this is easy to do though, = as these calls appear to be scattered all around the RPC / NFS source = paths. >>>=20 >>>> This does not explain the original hung Linux client problem, but = does shed light on the RST war I could create by doing a network = partitioning. >>>>=20 >>>> rick >>>=20 >>> _______________________________________________ >>> freebsd-net@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-net >>> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >>=20 >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"