Date: Wed, 17 Mar 2021 15:45:47 -0600 From: Alan Somers <asomers@freebsd.org> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: Jason Breitman <jbreitman@tildenparkcapital.com>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Subject: Re: NFS Mount Hangs Message-ID: <CAOtMX2gQFMWbGKBzLcPW4zOBpQ3YR5=9DRpTyTDi2SC%2BhE8Ehw@mail.gmail.com> In-Reply-To: <YQXPR0101MB0968DC18E00833DE2969C636DD6A9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> References: <C643BB9C-6B61-4DAC-8CF9-CE04EA7292D0@tildenparkcapital.com> <3750001D-3F1C-4D9A-A9D9-98BCA6CA65A4@tildenparkcapital.com> <33693DE3-7FF8-4FAB-9A75-75576B88A566@tildenparkcapital.com> <YQXPR0101MB0968DC18E00833DE2969C636DD6A9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Mar 17, 2021 at 3:37 PM Rick Macklem <rmacklem@uoguelph.ca> wrote: > Jason Breitman wrote: > >Please review the details below and let me know if there is a setting > that I should >apply to my FreeBSD NFS Server or if there is a bug fix that > I can apply to resolve my >issue. > >I shared this information with the linux-nfs mailing list and they > believe the issue is >on the server side. > I actually lurk there and saw your post. I'll admit I smiled when Trond > argued > that a hung Linux system is the result of a server failing to send a > fin/ack for > a closing TCP connection. But, here's a few comments.. > > >Issue > >NFSv4 mounts periodically hang on the NFS Client. > > > >During this time, it is possible to manually mount from another NFS > Server on the >NFS Client having issues. > >Also, other NFS Clients are successfully mounting from the NFS Server in > question. > >Rebooting the NFS Client appears to be the only solution. > > > >Environment > >NFS Server > >OS: FreeBSD 12.1-RELEASE-p5 > > > >NFS Client > >OS: Debian Buster 10.8 > >Kernel: 4.19.171-2 > >Protocol: NFSv4 with Kerberos Security > >Mount Options: nfs-server.domain.com:/data /mnt/data nfs4 > >lookupcache=pos,noresvport,sec=krb5,hard,rsize=1048576,wsize=1048576 00 > The maximum I/O size supported by FreeBSD is 128K. > Is the 128K limit related to MAXPHYS? If so, it should be greater in 13.0. > The client should acquire the attributes that indicate that and set > rsize/wsize > to that. "# nfsstat -m" on the client should show you what the client > is actually using. If it is larger than 128K, set both rsize and wsize to > 128K. > > >Output from the NFS Client when the issue occurs > ># netstat -an | grep NFS.Server.IP.X > >tcp 0 0 NFS.Client.IP.X:46896 NFS.Server.IP.X:2049 > FIN_WAIT2 > I'm no TCP guy. Hopefully others might know why the client would be > stuck in FIN_WAIT2 (I vaguely recall this means it is waiting for a > fin/ack, > but could be wrong?) > > ># cat /sys/kernel/debug/sunrpc/rpc_xprt/*/info > >netid: tcp > >addr: NFS.Server.IP.X > >port: 2049 > >state: 0x51 > > > >syslog > >Mar 4 10:29:27 hostname kernel: [437414.131978] -pid- flgs status > -client- --rqstp- ->timeout ---ops-- > >Mar 4 10:29:27 hostname kernel: [437414.133158] 57419 40a1 0 > 9b723c73 >143cfadf 30000 4ca953b5 nfsv4 OPEN_NOATTR > a:call_connect_status [sunrpc] >q:xprt_pending > I don't know what OPEN_NOATTR means, but I assume it is some variant > of NFSv4 Open operation. > [stuff snipped] > >Mar 4 10:29:30 hostname kernel: [437417.110517] RPC: 57419 > xprt_connect_status: >connect attempt timed out > >Mar 4 10:29:30 hostname kernel: [437417.112172] RPC: 57419 > call_connect_status > >(status -110) > I have no idea what status -110 means? > >Mar 4 10:29:30 hostname kernel: [437417.113337] RPC: 57419 call_timeout > (major) > >Mar 4 10:29:30 hostname kernel: [437417.114385] RPC: 57419 call_bind > (status 0) > >Mar 4 10:29:30 hostname kernel: [437417.115402] RPC: 57419 call_connect > xprt >00000000e061831b is not connected > >Mar 4 10:29:30 hostname kernel: [437417.116547] RPC: 57419 xprt_connect > xprt >00000000e061831b is not connected > >Mar 4 10:30:31 hostname kernel: [437478.551090] RPC: 57419 > xprt_connect_status: >connect attempt timed out > >Mar 4 10:30:31 hostname kernel: [437478.552396] RPC: 57419 > call_connect_status >(status -110) > >Mar 4 10:30:31 hostname kernel: [437478.553417] RPC: 57419 call_timeout > (minor) > >Mar 4 10:30:31 hostname kernel: [437478.554327] RPC: 57419 call_bind > (status 0) > >Mar 4 10:30:31 hostname kernel: [437478.555220] RPC: 57419 call_connect > xprt >00000000e061831b is not connected > >Mar 4 10:30:31 hostname kernel: [437478.556254] RPC: 57419 xprt_connect > xprt >00000000e061831b is not connected > Is it possible that the client is trying to (re)connect using the same > client port#? > I would normally expect the client to create a new TCP connection using a > different client port# and then retry the outstanding RPCs. > --> Capturing packets when this happens would show us what is going on. > > If there is a problem on the FreeBSD end, it is most likely a broken > network device driver. > --> Try disabling TSO , LRO. > --> Try a different driver for the net hardware on the server. > --> Try a different net chip on the server. > If you can capture packets when (not after) the hang > occurs, then you can look at them in wireshark and see > what is actually happening. (Ideally on both client and > server, to check that your network hasn't dropped anything.) > --> I know, if the hangs aren't easily reproducible, this isn't > easily done. > --> Try a newer Linux kernel and see if the problem persists. > The Linux folk will get more interested if you can reproduce > the problem on 5.12. (Recent bakeathon testing of the 5.12 > kernel against the FreeBSD server did not find any issues.) > > Hopefully the network folk have some insight w.r.t. why > the TCP connection is sitting in FIN_WAIT2. > > rick > > > > Jason Breitman > > > > > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2gQFMWbGKBzLcPW4zOBpQ3YR5=9DRpTyTDi2SC%2BhE8Ehw>