From owner-freebsd-net@freebsd.org Fri Mar 19 18:09:34 2021 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 3C22557D4A7 for ; Fri, 19 Mar 2021 18:09:34 +0000 (UTC) (envelope-from tuexen@freebsd.org) Received: from drew.franken.de (mail-n.franken.de [193.175.24.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.franken.de", Issuer "Sectigo RSA Domain Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4F2Bk60MQmz3P3d; Fri, 19 Mar 2021 18:09:33 +0000 (UTC) (envelope-from tuexen@freebsd.org) Received: from [IPv6:2a02:8109:1140:c3d:e919:4da4:f4f6:be3b] (unknown [IPv6:2a02:8109:1140:c3d:e919:4da4:f4f6:be3b]) (Authenticated sender: macmic) by drew.franken.de (Postfix) with ESMTPSA id 3B2F7708C57C1; Fri, 19 Mar 2021 19:09:30 +0100 (CET) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\)) Subject: Re: NFS Mount Hangs From: tuexen@freebsd.org In-Reply-To: Date: Fri, 19 Mar 2021 19:09:29 +0100 Cc: Rick Macklem , "freebsd-net@freebsd.org" , Alexander Motin Content-Transfer-Encoding: quoted-printable Message-Id: <03DE00F1-B60D-49AE-AC53-C83BA9F0F5C7@freebsd.org> References: <3750001D-3F1C-4D9A-A9D9-98BCA6CA65A4@tildenparkcapital.com> <33693DE3-7FF8-4FAB-9A75-75576B88A566@tildenparkcapital.com> <2890D243-AF46-43A4-A1AD-CB0C3481511D@lurchi.franken.de> <9EE3DFAC-72B0-4256-B57C-DE6AA811413C@freebsd.org> To: "Scheffenegger, Richard" X-Mailer: Apple Mail (2.3654.60.0.2.21) X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00, URIBL_BLOCKED autolearn=disabled version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mail-n.franken.de X-Rspamd-Queue-Id: 4F2Bk60MQmz3P3d X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Mar 2021 18:09:34 -0000 > On 19. Mar 2021, at 17:14, Scheffenegger, Richard = wrote: >=20 > Hi Rick, >=20 > I did some reshuffling of socket-upcalls recently in the TCP stack, to = prevent some race conditions with our $work in-kernel NFS server = implementation. Are these changes in 12.1p5? This is the OS version used by the reporter = of the bug. Best regards Michael >=20 > Just mentioning this, as this may slightly change the timing (mostly = delay the upcall until TCP processing is all done, while before an = in-kernel consumer could register for a socket upcall, do some fancy = stuff with the data sitting in the socket bufferes, before returning to = the tcp processing). >=20 > But I think there is no socket data handling being done in the = upstream in-kernel NFS server (and I have not even checked, if it = actually registers an socket-upcall handler). >=20 > = https://reviews.freebsd.org/R10:4d0770f1725f84e8bcd059e6094b6bd29bed6cc3 >=20 > If you can reproduce this easily, perhaps back out this change and see = if that has an impact... >=20 > NFS server is to my knowledge the only upstream in-kernel TCP consumer = which may be impacted by this. >=20 > Richard Scheffenegger >=20 >=20 > -----Urspr=C3=BCngliche Nachricht----- > Von: owner-freebsd-net@freebsd.org Im = Auftrag von Rick Macklem > Gesendet: Freitag, 19. M=C3=A4rz 2021 16:58 > An: tuexen@freebsd.org > Cc: Scheffenegger, Richard ; = freebsd-net@freebsd.org; Alexander Motin > Betreff: Re: NFS Mount Hangs >=20 > NetApp Security WARNING: This is an external email. Do not click links = or open attachments unless you recognize the sender and know the content = is safe. >=20 >=20 >=20 >=20 > Michael Tuexen wrote: >>> On 18. Mar 2021, at 21:55, Rick Macklem = wrote: >>>=20 >>> Michael Tuexen wrote: >>>>> On 18. Mar 2021, at 13:42, Scheffenegger, Richard = wrote: >>>>>=20 >>>>>>> Output from the NFS Client when the issue occurs # netstat -an |=20= >>>>>>> grep NFS.Server.IP.X >>>>>>> tcp 0 0 NFS.Client.IP.X:46896 = NFS.Server.IP.X:2049 FIN_WAIT2 >>>>>> I'm no TCP guy. Hopefully others might know why the client would=20= >>>>>> be stuck in FIN_WAIT2 (I vaguely recall this means it is waiting=20= >>>>>> for a fin/ack, but could be wrong?) >>>>>=20 >>>>> When the client is in Fin-Wait2 this is the state you end up when = the Client side actively close() the tcp session, and then the server = also ACKed the FIN. >>> Jason noted: >>>=20 >>>> When the issue occurs, this is what I see on the NFS Server. >>>> tcp4 0 0 NFS.Server.IP.X.2049 NFS.Client.IP.X.51550 = CLOSE_WAIT >>>>=20 >>>> which corresponds to the state on the client side. The server=20 >>>> received the FIN from the client and acked it. >>>> The server is waiting for a close call to happen. >>>> So the question is: Is the server also closing the connection? >>> Did you mean to say "client closing the connection here?" >> Yes. >>>=20 >>> The server should call soclose() { it never calls soshutdown() } = when=20 >>> soreceive(with MSG_WAIT) returns 0 bytes or an error that indicates=20= >>> the socket is broken. > Btw, I looked and the soreceive() is done with MSG_DONTWAIT, but the = EWOULDBLOCK is handled appropriately. >=20 >>> --> The soreceive() call is triggered by an upcall for the rcv side = of the socket. >>> So, are you saying the FreeBSD NFS server did not call soclose() for = this case? >> Yes. If the state at the server side is CLOSE_WAIT, no close call has = happened yet. >> The FIN from the client was received, it was ACKED, but no close() = call=20 >> (or shutdown(..., SHUT_WR) or shutdown(..., SHUT_RDWR)) was issued.=20= >> Therefore, no FIN was sent and the client should be in the FINWAIT-2=20= >> state. This was also reported. So the reported states are consistent. > For a test, I commented out the soclose() call in the server side krpc = and, when I dismounted, it did leave the server socket in CLOSE_WAIT. > For the FreeBSD client, it did the dismount and the socket was in = FIN_WAIT2 for a little while and then disappeared (someone mentioned a = short timeout and that seems to be the case). > I might argue that the Linux client should not get hung when this = occurs, but there does appear to be an issue on the FreeBSD end. >=20 > So it does appear you have a case where the soclose() call is not = happening on the FreeBSD NFS server. I am a little surprised since I = don't think I've heard of this before and the code is at least 10years = old (at least the parts related to this). >=20 > For the soclose() to not happen, the reference count on the socket = structure cannot have gone to zero. (ie a SVC_RELEASE() was missed) Upon = code inspection, I was not able to spot a reference counting bug. > (Not too surprising, since a reference counting bug should have shown = up long ago.) >=20 > The only thing I spotted that could conceivably explain this is that = the function svc_vc_stat() which returns the indication that the socket = has been closed at the other end did not bother to do any locking when = it checked the status. (I am not yet sure if this could result in the = status of XPRT_DIED being missed by the call, but if so, that would = result in the soclose() call not happening.) >=20 > I have attached a small patch, which I think is safe, that adds = locking to svc_vc_stat(),which I am hoping you can try at some point. > (I realize this is difficult for a production server, but...) I have = tested it a little and will test it some more, to try and ensure it does = not break anything. >=20 > I have also cc'd mav@, since he's the guy who last worked on this = code, in case he has any insight w.r.t. how the soclose() might get = missed (or any other way the server socket gets stuck in CLOSE_WAIT). >=20 > rick > ps: I'll create a PR for this, so that it doesn't get forgotten. >=20 > Best regards > Michael >=20 >>=20 >> rick >>=20 >> Best regards >> Michael >>> This will last for ~2 min or so, but is asynchronous. However, the = same 4-tuple can not be reused during this time. >>>=20 >>> With other words, from the socket / TCP, a properly executed active=20= >>> close() will end up in this state. (If the other side initiated the=20= >>> close, a passive close, will not end in this state) >>>=20 >>>=20 >>> _______________________________________________ >>> freebsd-net@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-net >>> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >>=20 >>=20 >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"