Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Mar 2021 15:45:47 -0600
From:      Alan Somers <asomers@freebsd.org>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        Jason Breitman <jbreitman@tildenparkcapital.com>,  "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: NFS Mount Hangs
Message-ID:  <CAOtMX2gQFMWbGKBzLcPW4zOBpQ3YR5=9DRpTyTDi2SC%2BhE8Ehw@mail.gmail.com>
In-Reply-To: <YQXPR0101MB0968DC18E00833DE2969C636DD6A9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
References:  <C643BB9C-6B61-4DAC-8CF9-CE04EA7292D0@tildenparkcapital.com> <3750001D-3F1C-4D9A-A9D9-98BCA6CA65A4@tildenparkcapital.com> <33693DE3-7FF8-4FAB-9A75-75576B88A566@tildenparkcapital.com> <YQXPR0101MB0968DC18E00833DE2969C636DD6A9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Mar 17, 2021 at 3:37 PM Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Jason Breitman wrote:
> >Please review the details below and let me know if there is a setting
> that I should >apply to my FreeBSD NFS Server or if there is a bug fix that
> I can apply to resolve my >issue.
> >I shared this information with the linux-nfs mailing list and they
> believe the issue is >on the server side.
> I actually lurk there and saw your post. I'll admit I smiled when Trond
> argued
> that a hung Linux system is the result of a server failing to send a
> fin/ack for
> a closing TCP connection. But, here's a few comments..
>
> >Issue
> >NFSv4 mounts periodically hang on the NFS Client.
> >
> >During this time, it is possible to manually mount from another NFS
> Server on the >NFS Client having issues.
> >Also, other NFS Clients are successfully mounting from the NFS Server in
> question.
> >Rebooting the NFS Client appears to be the only solution.
> >
> >Environment
> >NFS Server
> >OS:          FreeBSD 12.1-RELEASE-p5
> >
> >NFS Client
> >OS:             Debian Buster 10.8
> >Kernel: 4.19.171-2
> >Protocol:       NFSv4 with Kerberos Security
> >Mount Options:  nfs-server.domain.com:/data     /mnt/data       nfs4
> >lookupcache=pos,noresvport,sec=krb5,hard,rsize=1048576,wsize=1048576    00
> The maximum I/O size supported by FreeBSD is 128K.
>

Is the 128K limit related to MAXPHYS?  If so, it should be greater in 13.0.


> The client should acquire the attributes that indicate that and set
> rsize/wsize
> to that. "# nfsstat -m" on the client should show you what the client
> is actually using. If it is larger than 128K, set both rsize and wsize to
> 128K.
>
> >Output from the NFS Client when the issue occurs
> ># netstat -an | grep NFS.Server.IP.X
> >tcp        0      0 NFS.Client.IP.X:46896      NFS.Server.IP.X:2049
>  FIN_WAIT2
> I'm no TCP guy. Hopefully others might know why the client would be
> stuck in FIN_WAIT2 (I vaguely recall this means it is waiting for a
> fin/ack,
> but could be wrong?)
>
> ># cat /sys/kernel/debug/sunrpc/rpc_xprt/*/info
> >netid: tcp
> >addr:  NFS.Server.IP.X
> >port:  2049
> >state: 0x51
> >
> >syslog
> >Mar  4 10:29:27 hostname kernel: [437414.131978] -pid- flgs status
> -client- --rqstp- ->timeout ---ops--
> >Mar  4 10:29:27 hostname kernel: [437414.133158] 57419 40a1      0
> 9b723c73 >143cfadf    30000 4ca953b5 nfsv4 OPEN_NOATTR
> a:call_connect_status [sunrpc] >q:xprt_pending
> I don't know what OPEN_NOATTR means, but I assume it is some variant
> of NFSv4 Open operation.
> [stuff snipped]
> >Mar  4 10:29:30 hostname kernel: [437417.110517] RPC: 57419
> xprt_connect_status: >connect attempt timed out
> >Mar  4 10:29:30 hostname kernel: [437417.112172] RPC: 57419
> call_connect_status
> >(status -110)
> I have no idea what status -110 means?
> >Mar  4 10:29:30 hostname kernel: [437417.113337] RPC: 57419 call_timeout
> (major)
> >Mar  4 10:29:30 hostname kernel: [437417.114385] RPC: 57419 call_bind
> (status 0)
> >Mar  4 10:29:30 hostname kernel: [437417.115402] RPC: 57419 call_connect
> xprt >00000000e061831b is not connected
> >Mar  4 10:29:30 hostname kernel: [437417.116547] RPC: 57419 xprt_connect
> xprt >00000000e061831b is not connected
> >Mar  4 10:30:31 hostname kernel: [437478.551090] RPC: 57419
> xprt_connect_status: >connect attempt timed out
> >Mar  4 10:30:31 hostname kernel: [437478.552396] RPC: 57419
> call_connect_status >(status -110)
> >Mar  4 10:30:31 hostname kernel: [437478.553417] RPC: 57419 call_timeout
> (minor)
> >Mar  4 10:30:31 hostname kernel: [437478.554327] RPC: 57419 call_bind
> (status 0)
> >Mar  4 10:30:31 hostname kernel: [437478.555220] RPC: 57419 call_connect
> xprt >00000000e061831b is not connected
> >Mar  4 10:30:31 hostname kernel: [437478.556254] RPC: 57419 xprt_connect
> xprt >00000000e061831b is not connected
> Is it possible that the client is trying to (re)connect using the same
> client port#?
> I would normally expect the client to create a new TCP connection using a
> different client port# and then retry the outstanding RPCs.
> --> Capturing packets when this happens would show us what is going on.
>
> If there is a problem on the FreeBSD end, it is most likely a broken
> network device driver.
> --> Try disabling TSO , LRO.
> --> Try a different driver for the net hardware on the server.
> --> Try a different net chip on the server.
> If you can capture packets when (not after) the hang
> occurs, then you can look at them in wireshark and see
> what is actually happening. (Ideally on both client and
> server, to check that your network hasn't dropped anything.)
> --> I know, if the hangs aren't easily reproducible, this isn't
>     easily done.
> --> Try a newer Linux kernel and see if the problem persists.
>      The Linux folk will get more interested if you can reproduce
>       the problem on 5.12. (Recent bakeathon testing of the 5.12
>       kernel against the FreeBSD server did not find any issues.)
>
> Hopefully the network folk have some insight w.r.t. why
> the TCP connection is sitting in FIN_WAIT2.
>
> rick
>
>
>
> Jason Breitman
>
>
>
>
>
>
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2gQFMWbGKBzLcPW4zOBpQ3YR5=9DRpTyTDi2SC%2BhE8Ehw>