Date: Fri, 19 Mar 2021 19:09:29 +0100 From: tuexen@freebsd.org To: "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com> Cc: Rick Macklem <rmacklem@uoguelph.ca>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, Alexander Motin <mav@FreeBSD.org> Subject: Re: NFS Mount Hangs Message-ID: <03DE00F1-B60D-49AE-AC53-C83BA9F0F5C7@freebsd.org> In-Reply-To: <SN4PR0601MB372895EE1F6DDFA830D4B7AC86689@SN4PR0601MB3728.namprd06.prod.outlook.com> References: <C643BB9C-6B61-4DAC-8CF9-CE04EA7292D0@tildenparkcapital.com> <3750001D-3F1C-4D9A-A9D9-98BCA6CA65A4@tildenparkcapital.com> <33693DE3-7FF8-4FAB-9A75-75576B88A566@tildenparkcapital.com> <YQXPR0101MB0968DC18E00833DE2969C636DD6A9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <SN4PR0601MB3728780CE9ADAB144B3B681486699@SN4PR0601MB3728.namprd06.prod.outlook.com> <2890D243-AF46-43A4-A1AD-CB0C3481511D@lurchi.franken.de> <YQXPR0101MB0968D2362456D43DF528A7E9DD699@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <9EE3DFAC-72B0-4256-B57C-DE6AA811413C@freebsd.org> <YQXPR0101MB0968E1537E26CDBDC31C58E5DD689@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <SN4PR0601MB372895EE1F6DDFA830D4B7AC86689@SN4PR0601MB3728.namprd06.prod.outlook.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 19. Mar 2021, at 17:14, Scheffenegger, Richard = <Richard.Scheffenegger@netapp.com> wrote: >=20 > Hi Rick, >=20 > I did some reshuffling of socket-upcalls recently in the TCP stack, to = prevent some race conditions with our $work in-kernel NFS server = implementation. Are these changes in 12.1p5? This is the OS version used by the reporter = of the bug. Best regards Michael >=20 > Just mentioning this, as this may slightly change the timing (mostly = delay the upcall until TCP processing is all done, while before an = in-kernel consumer could register for a socket upcall, do some fancy = stuff with the data sitting in the socket bufferes, before returning to = the tcp processing). >=20 > But I think there is no socket data handling being done in the = upstream in-kernel NFS server (and I have not even checked, if it = actually registers an socket-upcall handler). >=20 > = https://reviews.freebsd.org/R10:4d0770f1725f84e8bcd059e6094b6bd29bed6cc3 >=20 > If you can reproduce this easily, perhaps back out this change and see = if that has an impact... >=20 > NFS server is to my knowledge the only upstream in-kernel TCP consumer = which may be impacted by this. >=20 > Richard Scheffenegger >=20 >=20 > -----Urspr=C3=BCngliche Nachricht----- > Von: owner-freebsd-net@freebsd.org <owner-freebsd-net@freebsd.org> Im = Auftrag von Rick Macklem > Gesendet: Freitag, 19. M=C3=A4rz 2021 16:58 > An: tuexen@freebsd.org > Cc: Scheffenegger, Richard <Richard.Scheffenegger@netapp.com>; = freebsd-net@freebsd.org; Alexander Motin <mav@FreeBSD.org> > Betreff: Re: NFS Mount Hangs >=20 > NetApp Security WARNING: This is an external email. Do not click links = or open attachments unless you recognize the sender and know the content = is safe. >=20 >=20 >=20 >=20 > Michael Tuexen wrote: >>> On 18. Mar 2021, at 21:55, Rick Macklem <rmacklem@uoguelph.ca> = wrote: >>>=20 >>> Michael Tuexen wrote: >>>>> On 18. Mar 2021, at 13:42, Scheffenegger, Richard = <Richard.Scheffenegger@netapp.com> wrote: >>>>>=20 >>>>>>> Output from the NFS Client when the issue occurs # netstat -an |=20= >>>>>>> grep NFS.Server.IP.X >>>>>>> tcp 0 0 NFS.Client.IP.X:46896 = NFS.Server.IP.X:2049 FIN_WAIT2 >>>>>> I'm no TCP guy. Hopefully others might know why the client would=20= >>>>>> be stuck in FIN_WAIT2 (I vaguely recall this means it is waiting=20= >>>>>> for a fin/ack, but could be wrong?) >>>>>=20 >>>>> When the client is in Fin-Wait2 this is the state you end up when = the Client side actively close() the tcp session, and then the server = also ACKed the FIN. >>> Jason noted: >>>=20 >>>> When the issue occurs, this is what I see on the NFS Server. >>>> tcp4 0 0 NFS.Server.IP.X.2049 NFS.Client.IP.X.51550 = CLOSE_WAIT >>>>=20 >>>> which corresponds to the state on the client side. The server=20 >>>> received the FIN from the client and acked it. >>>> The server is waiting for a close call to happen. >>>> So the question is: Is the server also closing the connection? >>> Did you mean to say "client closing the connection here?" >> Yes. >>>=20 >>> The server should call soclose() { it never calls soshutdown() } = when=20 >>> soreceive(with MSG_WAIT) returns 0 bytes or an error that indicates=20= >>> the socket is broken. > Btw, I looked and the soreceive() is done with MSG_DONTWAIT, but the = EWOULDBLOCK is handled appropriately. >=20 >>> --> The soreceive() call is triggered by an upcall for the rcv side = of the socket. >>> So, are you saying the FreeBSD NFS server did not call soclose() for = this case? >> Yes. If the state at the server side is CLOSE_WAIT, no close call has = happened yet. >> The FIN from the client was received, it was ACKED, but no close() = call=20 >> (or shutdown(..., SHUT_WR) or shutdown(..., SHUT_RDWR)) was issued.=20= >> Therefore, no FIN was sent and the client should be in the FINWAIT-2=20= >> state. This was also reported. So the reported states are consistent. > For a test, I commented out the soclose() call in the server side krpc = and, when I dismounted, it did leave the server socket in CLOSE_WAIT. > For the FreeBSD client, it did the dismount and the socket was in = FIN_WAIT2 for a little while and then disappeared (someone mentioned a = short timeout and that seems to be the case). > I might argue that the Linux client should not get hung when this = occurs, but there does appear to be an issue on the FreeBSD end. >=20 > So it does appear you have a case where the soclose() call is not = happening on the FreeBSD NFS server. I am a little surprised since I = don't think I've heard of this before and the code is at least 10years = old (at least the parts related to this). >=20 > For the soclose() to not happen, the reference count on the socket = structure cannot have gone to zero. (ie a SVC_RELEASE() was missed) Upon = code inspection, I was not able to spot a reference counting bug. > (Not too surprising, since a reference counting bug should have shown = up long ago.) >=20 > The only thing I spotted that could conceivably explain this is that = the function svc_vc_stat() which returns the indication that the socket = has been closed at the other end did not bother to do any locking when = it checked the status. (I am not yet sure if this could result in the = status of XPRT_DIED being missed by the call, but if so, that would = result in the soclose() call not happening.) >=20 > I have attached a small patch, which I think is safe, that adds = locking to svc_vc_stat(),which I am hoping you can try at some point. > (I realize this is difficult for a production server, but...) I have = tested it a little and will test it some more, to try and ensure it does = not break anything. >=20 > I have also cc'd mav@, since he's the guy who last worked on this = code, in case he has any insight w.r.t. how the soclose() might get = missed (or any other way the server socket gets stuck in CLOSE_WAIT). >=20 > rick > ps: I'll create a PR for this, so that it doesn't get forgotten. >=20 > Best regards > Michael >=20 >>=20 >> rick >>=20 >> Best regards >> Michael >>> This will last for ~2 min or so, but is asynchronous. However, the = same 4-tuple can not be reused during this time. >>>=20 >>> With other words, from the socket / TCP, a properly executed active=20= >>> close() will end up in this state. (If the other side initiated the=20= >>> close, a passive close, will not end in this state) >>>=20 >>>=20 >>> _______________________________________________ >>> freebsd-net@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-net >>> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >>=20 >>=20 >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to = "freebsd-net-unsubscribe@freebsd.org" >=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >=20 > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?03DE00F1-B60D-49AE-AC53-C83BA9F0F5C7>