Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 18 Mar 2021 13:58:30 +0100
From:      tuexen@freebsd.org
To:        "Rodney W. Grimes" <freebsd-rwg@gndrsh.dnsmgr.net>
Cc:        Rick Macklem <rmacklem@uoguelph.ca>, Alan Somers <asomers@freebsd.org>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: NFS Mount Hangs
Message-ID:  <B11F5CD9-A026-4549-89A3-57D1E180C628@freebsd.org>
In-Reply-To: <202103181253.12ICrF35016815@gndrsh.dnsmgr.net>
References:  <202103181253.12ICrF35016815@gndrsh.dnsmgr.net>

next in thread | previous in thread | raw e-mail | index | archive | help
> On 18. Mar 2021, at 13:53, Rodney W. Grimes =
<freebsd-rwg@gndrsh.dnsmgr.net> wrote:
>=20
> Note I am NOT a TCP expert, but know enough about it to add a =
comment...
>=20
>> Alan Somers wrote:
>> [stuff snipped]
>>> Is the 128K limit related to MAXPHYS?  If so, it should be greater =
in 13.0.
>> For the client, yes. For the server, no.
>> For the server, it is just a compile time constant NFS_SRVMAXIO.
>>=20
>> It's mainly related to the fact that I haven't gotten around to =
testing larger
>> sizes yet.
>> - kern.ipc.maxsockbuf needs to be several times the limit, which =
means it would
>>   have to increase for 1Mbyte.
>> - The session code must negotiate a maximum RPC size > 1 Mbyte.
>>   (I think the server code does do this, but it needs to be tested.)
>> And, yes, the client is limited to MAXPHYS.
>>=20
>> Doing this is on my todo list, rick
>>=20
>> The client should acquire the attributes that indicate that and set =
rsize/wsize
>> to that. "# nfsstat -m" on the client should show you what the client
>> is actually using. If it is larger than 128K, set both rsize and =
wsize to 128K.
>>=20
>>> Output from the NFS Client when the issue occurs
>>> # netstat -an | grep NFS.Server.IP.X
>>> tcp        0      0 NFS.Client.IP.X:46896      NFS.Server.IP.X:2049  =
     FIN_WAIT2
>> I'm no TCP guy. Hopefully others might know why the client would be
>> stuck in FIN_WAIT2 (I vaguely recall this means it is waiting for a =
fin/ack,
>> but could be wrong?)
>=20
> The most common way to get stuck in FIN_WAIT2 is to call
> shutdown(2) on a socket, but never following up with a
> close(2) after some timeout period.  The "client" is still
> connected to the socket and can stay in this shutdown state
> for ever, the kernel well not reap the socket as it is
> associated with a processes, aka not orphaned.  I suspect
> that the Linux client has a corner condition that is leading
> to this socket leak.
>=20
> If on the Linux client you can look at the sockets to see
> if these are still associated with a process, ala fstat or
> what ever Linux tool does this that would be helpfull.
> If they are infact connected to a process it is that
> process that must call close(2) to clean these up.
>=20
> IIRC the server side socket would be gone at this point
> and there is nothing the server can do that would allow
> a FIN_WAIT2 to close down.
Jason reported that the server is in CLOSE-WAIT. This would
mean the the server received the FIN, ACKed it, but has not
initiated the teardown of the Server->Client direction.
So the server side socket is still there and close has not
be called yet.
>=20
> The real TCP experts can now correct my 30 year old TCP
> stack understanding...
I wouldn't count myself as a real TCP expert, but the behaviour
hasn't changed in the last 30 years, I think...

Best regards
Michael
>=20
>>=20
>>> # cat /sys/kernel/debug/sunrpc/rpc_xprt/*/info
>>> netid: tcp
>>> addr:  NFS.Server.IP.X
>>> port:  2049
>>> state: 0x51
>>>=20
>>> syslog
>>> Mar  4 10:29:27 hostname kernel: [437414.131978] -pid- flgs status =
-client- --rqstp- ->timeout ---ops--
>>> Mar  4 10:29:27 hostname kernel: [437414.133158] 57419 40a1      0 =
9b723c73 >143cfadf    30000 4ca953b5 nfsv4 OPEN_NOATTR =
a:call_connect_status [sunrpc] >q:xprt_pending
>> I don't know what OPEN_NOATTR means, but I assume it is some variant
>> of NFSv4 Open operation.
>> [stuff snipped]
>>> Mar  4 10:29:30 hostname kernel: [437417.110517] RPC: 57419 =
xprt_connect_status: >connect attempt timed out
>>> Mar  4 10:29:30 hostname kernel: [437417.112172] RPC: 57419 =
call_connect_status
>>> (status -110)
>> I have no idea what status -110 means?
>>> Mar  4 10:29:30 hostname kernel: [437417.113337] RPC: 57419 =
call_timeout (major)
>>> Mar  4 10:29:30 hostname kernel: [437417.114385] RPC: 57419 =
call_bind (status 0)
>>> Mar  4 10:29:30 hostname kernel: [437417.115402] RPC: 57419 =
call_connect xprt >00000000e061831b is not connected
>>> Mar  4 10:29:30 hostname kernel: [437417.116547] RPC: 57419 =
xprt_connect xprt >00000000e061831b is not connected
>>> Mar  4 10:30:31 hostname kernel: [437478.551090] RPC: 57419 =
xprt_connect_status: >connect attempt timed out
>>> Mar  4 10:30:31 hostname kernel: [437478.552396] RPC: 57419 =
call_connect_status >(status -110)
>>> Mar  4 10:30:31 hostname kernel: [437478.553417] RPC: 57419 =
call_timeout (minor)
>>> Mar  4 10:30:31 hostname kernel: [437478.554327] RPC: 57419 =
call_bind (status 0)
>>> Mar  4 10:30:31 hostname kernel: [437478.555220] RPC: 57419 =
call_connect xprt >00000000e061831b is not connected
>>> Mar  4 10:30:31 hostname kernel: [437478.556254] RPC: 57419 =
xprt_connect xprt >00000000e061831b is not connected
>> Is it possible that the client is trying to (re)connect using the =
same client port#?
>> I would normally expect the client to create a new TCP connection =
using a
>> different client port# and then retry the outstanding RPCs.
>> --> Capturing packets when this happens would show us what is going =
on.
>>=20
>> If there is a problem on the FreeBSD end, it is most likely a broken
>> network device driver.
>> --> Try disabling TSO , LRO.
>> --> Try a different driver for the net hardware on the server.
>> --> Try a different net chip on the server.
>> If you can capture packets when (not after) the hang
>> occurs, then you can look at them in wireshark and see
>> what is actually happening. (Ideally on both client and
>> server, to check that your network hasn't dropped anything.)
>> --> I know, if the hangs aren't easily reproducible, this isn't
>>    easily done.
>> --> Try a newer Linux kernel and see if the problem persists.
>>     The Linux folk will get more interested if you can reproduce
>>      the problem on 5.12. (Recent bakeathon testing of the 5.12
>>      kernel against the FreeBSD server did not find any issues.)
>>=20
>> Hopefully the network folk have some insight w.r.t. why
>> the TCP connection is sitting in FIN_WAIT2.
>>=20
>> rick
>>=20
>>=20
>>=20
>> Jason Breitman
>>=20
>>=20
>>=20
>>=20
>>=20
>>=20
>> _______________________________________________
>> freebsd-net@freebsd.org<mailto:freebsd-net@freebsd.org> mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to =
"freebsd-net-unsubscribe@freebsd.org<mailto:freebsd-net-unsubscribe@freebs=
d.org>"
>>=20
>> _______________________________________________
>> freebsd-net@freebsd.org<mailto:freebsd-net@freebsd.org> mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to =
"freebsd-net-unsubscribe@freebsd.org<mailto:freebsd-net-unsubscribe@freebs=
d.org>"
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to =
"freebsd-net-unsubscribe@freebsd.org"
>>=20
>=20
> --=20
> Rod Grimes                                                 =
rgrimes@freebsd.org
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B11F5CD9-A026-4549-89A3-57D1E180C628>