FreeBSD Mail Archives

Date:      Wed, 17 Mar 2021 21:37:20 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Jason Breitman <jbreitman@tildenparkcapital.com>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: NFS Mount Hangs
Message-ID:  <YQXPR0101MB0968DC18E00833DE2969C636DD6A9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <33693DE3-7FF8-4FAB-9A75-75576B88A566@tildenparkcapital.com>
References:  <C643BB9C-6B61-4DAC-8CF9-CE04EA7292D0@tildenparkcapital.com> <3750001D-3F1C-4D9A-A9D9-98BCA6CA65A4@tildenparkcapital.com>, <33693DE3-7FF8-4FAB-9A75-75576B88A566@tildenparkcapital.com>

Jason Breitman wrote:=0A=
>Please review the details below and let me know if there is a setting that=
 I should >apply to my FreeBSD NFS Server or if there is a bug fix that I c=
an apply to resolve my >issue.=0A=
>I shared this information with the linux-nfs mailing list and they believe=
 the issue is >on the server side.=0A=
I actually lurk there and saw your post. I'll admit I smiled when Trond arg=
ued=0A=
that a hung Linux system is the result of a server failing to send a fin/ac=
k for=0A=
a closing TCP connection. But, here's a few comments..=0A=
=0A=
>Issue=0A=
>NFSv4 mounts periodically hang on the NFS Client.=0A=
>=0A=
>During this time, it is possible to manually mount from another NFS Server=
 on the >NFS Client having issues.=0A=
>Also, other NFS Clients are successfully mounting from the NFS Server in q=
uestion.=0A=
>Rebooting the NFS Client appears to be the only solution.=0A=
>=0A=
>Environment=0A=
>NFS Server=0A=
>OS:          FreeBSD 12.1-RELEASE-p5=0A=
>=0A=
>NFS Client=0A=
>OS:             Debian Buster 10.8=0A=
>Kernel: 4.19.171-2=0A=
>Protocol:       NFSv4 with Kerberos Security=0A=
>Mount Options:  nfs-server.domain.com:/data     /mnt/data       nfs4    >l=
ookupcache=3Dpos,noresvport,sec=3Dkrb5,hard,rsize=3D1048576,wsize=3D1048576=
    00=0A=
The maximum I/O size supported by FreeBSD is 128K.=0A=
The client should acquire the attributes that indicate that and set rsize/w=
size=0A=
to that. "# nfsstat -m" on the client should show you what the client=0A=
is actually using. If it is larger than 128K, set both rsize and wsize to 1=
28K.=0A=
=0A=
>Output from the NFS Client when the issue occurs=0A=
># netstat -an | grep NFS.Server.IP.X=0A=
>tcp        0      0 NFS.Client.IP.X:46896      NFS.Server.IP.X:2049       =
FIN_WAIT2=0A=
I'm no TCP guy. Hopefully others might know why the client would be=0A=
stuck in FIN_WAIT2 (I vaguely recall this means it is waiting for a fin/ack=
,=0A=
but could be wrong?)=0A=
=0A=
># cat /sys/kernel/debug/sunrpc/rpc_xprt/*/info=0A=
>netid: tcp=0A=
>addr:  NFS.Server.IP.X=0A=
>port:  2049=0A=
>state: 0x51=0A=
>=0A=
>syslog=0A=
>Mar  4 10:29:27 hostname kernel: [437414.131978] -pid- flgs status -client=
- --rqstp- ->timeout ---ops--=0A=
>Mar  4 10:29:27 hostname kernel: [437414.133158] 57419 40a1      0 9b723c7=
3 >143cfadf    30000 4ca953b5 nfsv4 OPEN_NOATTR a:call_connect_status [sunr=
pc] >q:xprt_pending=0A=
I don't know what OPEN_NOATTR means, but I assume it is some variant=0A=
of NFSv4 Open operation.=0A=
[stuff snipped]=0A=
>Mar  4 10:29:30 hostname kernel: [437417.110517] RPC: 57419 xprt_connect_s=
tatus: >connect attempt timed out=0A=
>Mar  4 10:29:30 hostname kernel: [437417.112172] RPC: 57419 call_connect_s=
tatus =0A=
>(status -110)=0A=
I have no idea what status -110 means?=0A=
>Mar  4 10:29:30 hostname kernel: [437417.113337] RPC: 57419 call_timeout (=
major)=0A=
>Mar  4 10:29:30 hostname kernel: [437417.114385] RPC: 57419 call_bind (sta=
tus 0)=0A=
>Mar  4 10:29:30 hostname kernel: [437417.115402] RPC: 57419 call_connect x=
prt >00000000e061831b is not connected=0A=
>Mar  4 10:29:30 hostname kernel: [437417.116547] RPC: 57419 xprt_connect x=
prt >00000000e061831b is not connected=0A=
>Mar  4 10:30:31 hostname kernel: [437478.551090] RPC: 57419 xprt_connect_s=
tatus: >connect attempt timed out=0A=
>Mar  4 10:30:31 hostname kernel: [437478.552396] RPC: 57419 call_connect_s=
tatus >(status -110)=0A=
>Mar  4 10:30:31 hostname kernel: [437478.553417] RPC: 57419 call_timeout (=
minor)=0A=
>Mar  4 10:30:31 hostname kernel: [437478.554327] RPC: 57419 call_bind (sta=
tus 0)=0A=
>Mar  4 10:30:31 hostname kernel: [437478.555220] RPC: 57419 call_connect x=
prt >00000000e061831b is not connected=0A=
>Mar  4 10:30:31 hostname kernel: [437478.556254] RPC: 57419 xprt_connect x=
prt >00000000e061831b is not connected=0A=
Is it possible that the client is trying to (re)connect using the same clie=
nt port#?=0A=
I would normally expect the client to create a new TCP connection using a=
=0A=
different client port# and then retry the outstanding RPCs.=0A=
--> Capturing packets when this happens would show us what is going on.=0A=
=0A=
If there is a problem on the FreeBSD end, it is most likely a broken=0A=
network device driver.=0A=
--> Try disabling TSO , LRO.=0A=
--> Try a different driver for the net hardware on the server.=0A=
--> Try a different net chip on the server.=0A=
If you can capture packets when (not after) the hang=0A=
occurs, then you can look at them in wireshark and see=0A=
what is actually happening. (Ideally on both client and=0A=
server, to check that your network hasn't dropped anything.)=0A=
--> I know, if the hangs aren't easily reproducible, this isn't=0A=
    easily done.=0A=
--> Try a newer Linux kernel and see if the problem persists.=0A=
     The Linux folk will get more interested if you can reproduce=0A=
      the problem on 5.12. (Recent bakeathon testing of the 5.12=0A=
      kernel against the FreeBSD server did not find any issues.)=0A=
=0A=
Hopefully the network folk have some insight w.r.t. why=0A=
the TCP connection is sitting in FIN_WAIT2.=0A=
=0A=
rick=0A=
=0A=
=0A=
=0A=
Jason Breitman=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
=0A=
_______________________________________________=0A=
freebsd-net@freebsd.org mailing list=0A=
https://lists.freebsd.org/mailman/listinfo/freebsd-net=0A=
To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"=0A=
=0A=

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB0968DC18E00833DE2969C636DD6A9>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation