Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 Mar 2021 16:14:01 +0000
From:      "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>, "tuexen@freebsd.org" <tuexen@freebsd.org>
Cc:        "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, Alexander Motin <mav@FreeBSD.org>
Subject:   AW: NFS Mount Hangs
Message-ID:  <SN4PR0601MB372895EE1F6DDFA830D4B7AC86689@SN4PR0601MB3728.namprd06.prod.outlook.com>
In-Reply-To: <YQXPR0101MB0968E1537E26CDBDC31C58E5DD689@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
References:  <C643BB9C-6B61-4DAC-8CF9-CE04EA7292D0@tildenparkcapital.com> <3750001D-3F1C-4D9A-A9D9-98BCA6CA65A4@tildenparkcapital.com> <33693DE3-7FF8-4FAB-9A75-75576B88A566@tildenparkcapital.com> <YQXPR0101MB0968DC18E00833DE2969C636DD6A9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <SN4PR0601MB3728780CE9ADAB144B3B681486699@SN4PR0601MB3728.namprd06.prod.outlook.com> <2890D243-AF46-43A4-A1AD-CB0C3481511D@lurchi.franken.de> <YQXPR0101MB0968D2362456D43DF528A7E9DD699@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>, <9EE3DFAC-72B0-4256-B57C-DE6AA811413C@freebsd.org> <YQXPR0101MB0968E1537E26CDBDC31C58E5DD689@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Rick,

I did some reshuffling of socket-upcalls recently in the TCP stack, to prev=
ent some race conditions with our $work in-kernel NFS server implementation=
.

Just mentioning this, as this may slightly change the timing (mostly delay =
the upcall until TCP processing is all done, while before an in-kernel cons=
umer could register for a socket upcall, do some fancy stuff with the data =
sitting in the socket bufferes, before returning to the tcp processing).

But I think there is no socket data handling being done in the upstream in-=
kernel NFS server (and I have not even checked, if it actually registers an=
 socket-upcall handler).

https://reviews.freebsd.org/R10:4d0770f1725f84e8bcd059e6094b6bd29bed6cc3

If you can reproduce this easily, perhaps back out this change and see if t=
hat has an impact...

NFS server is to my knowledge the only upstream in-kernel TCP consumer whic=
h may be impacted by this.

Richard Scheffenegger


-----Urspr=FCngliche Nachricht-----
Von: owner-freebsd-net@freebsd.org <owner-freebsd-net@freebsd.org> Im Auftr=
ag von Rick Macklem
Gesendet: Freitag, 19. M=E4rz 2021 16:58
An: tuexen@freebsd.org
Cc: Scheffenegger, Richard <Richard.Scheffenegger@netapp.com>; freebsd-net@=
freebsd.org; Alexander Motin <mav@FreeBSD.org>
Betreff: Re: NFS Mount Hangs

NetApp Security WARNING: This is an external email. Do not click links or o=
pen attachments unless you recognize the sender and know the content is saf=
e.




Michael Tuexen wrote:
>> On 18. Mar 2021, at 21:55, Rick Macklem <rmacklem@uoguelph.ca> wrote:
>>
>> Michael Tuexen wrote:
>>>> On 18. Mar 2021, at 13:42, Scheffenegger, Richard <Richard.Scheffenegg=
er@netapp.com> wrote:
>>>>
>>>>>> Output from the NFS Client when the issue occurs # netstat -an |=20
>>>>>> grep NFS.Server.IP.X
>>>>>> tcp        0      0 NFS.Client.IP.X:46896      NFS.Server.IP.X:2049 =
      FIN_WAIT2
>>>>> I'm no TCP guy. Hopefully others might know why the client would=20
>>>>> be stuck in FIN_WAIT2 (I vaguely recall this means it is waiting=20
>>>>> for a fin/ack, but could be wrong?)
>>>>
>>>> When the client is in Fin-Wait2 this is the state you end up when the =
Client side actively close() the tcp session, and then the server also ACKe=
d the FIN.
>> Jason noted:
>>
>>> When the issue occurs, this is what I see on the NFS Server.
>>> tcp4       0      0 NFS.Server.IP.X.2049      NFS.Client.IP.X.51550    =
 CLOSE_WAIT
>>>
>>> which corresponds to the state on the client side. The server=20
>>> received the FIN from the client and acked it.
>>> The server is waiting for a close call to happen.
>>> So the question is: Is the server also closing the connection?
>> Did you mean to say "client closing the connection here?"
>Yes.
>>
>> The server should call soclose() { it never calls soshutdown() } when=20
>> soreceive(with MSG_WAIT) returns 0 bytes or an error that indicates=20
>> the socket is broken.
Btw, I looked and the soreceive() is done with MSG_DONTWAIT, but the EWOULD=
BLOCK is handled appropriately.

>> --> The soreceive() call is triggered by an upcall for the rcv side of t=
he socket.
>> So, are you saying the FreeBSD NFS server did not call soclose() for thi=
s case?
>Yes. If the state at the server side is CLOSE_WAIT, no close call has happ=
ened yet.
>The FIN from the client was received, it was ACKED, but no close() call=20
>(or shutdown(..., SHUT_WR) or shutdown(..., SHUT_RDWR)) was issued.=20
>Therefore, no FIN was sent and the client should be in the FINWAIT-2=20
>state. This was also reported. So the reported states are consistent.
For a test, I commented out the soclose() call in the server side krpc and,=
 when I dismounted, it did leave the server socket in CLOSE_WAIT.
For the FreeBSD client, it did the dismount and the socket was in FIN_WAIT2=
 for a little while and then disappeared (someone mentioned a short timeout=
 and that seems to be the case).
I might argue that the Linux client should not get hung when this occurs, b=
ut there does appear to be an issue on the FreeBSD end.

So it does appear you have a case where the soclose() call is not happening=
 on the FreeBSD NFS server. I am a little surprised since I don't think I'v=
e heard of this before and the code is at least 10years old (at least the p=
arts related to this).

For the soclose() to not happen, the reference count on the socket structur=
e cannot have gone to zero. (ie a SVC_RELEASE() was missed) Upon code inspe=
ction, I was not able to spot a reference counting bug.
(Not too surprising, since a reference counting bug should have shown  up l=
ong ago.)

The only thing I spotted that could conceivably explain this is that the fu=
nction svc_vc_stat() which returns the indication that the socket has been =
closed at the other end did not bother to do any locking when it checked th=
e status. (I am not yet sure if this could result in the status of XPRT_DIE=
D being missed by the call, but if so, that would result in the soclose() c=
all not happening.)

I have attached a small patch, which I think is safe, that adds locking to =
svc_vc_stat(),which I am hoping you can try at some point.
(I realize this is difficult for a production server, but...) I have tested=
 it a little and will test it some more, to try and ensure it does not brea=
k anything.

I have also cc'd mav@, since he's the guy who last worked on this code, in =
case he has any insight w.r.t. how the soclose() might get missed (or any o=
ther way the server socket gets stuck in CLOSE_WAIT).

rick
ps: I'll create a PR for this, so that it doesn't get forgotten.

Best regards
Michael

>
> rick
>
> Best regards
> Michael
>> This will last for ~2 min or so, but is asynchronous. However, the same =
4-tuple can not be reused during this time.
>>
>> With other words, from the socket / TCP, a properly executed active=20
>> close() will end up in this state. (If the other side initiated the=20
>> close, a passive close, will not end in this state)
>>
>>
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>
>
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"

_______________________________________________
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?SN4PR0601MB372895EE1F6DDFA830D4B7AC86689>