Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 6 Oct 2023 17:12:10 -0700
From:      Rick Macklem <rick.macklem@gmail.com>
To:        J David <j.david.lists@gmail.com>
Cc:        FreeBSD FS <freebsd-fs@freebsd.org>
Subject:   Re: FreeBSD 13.2 NFS client mount hangs
Message-ID:  <CAM5tNy4P8bneozmjNwjJb=VhjLP0wMhAwJptWOJ1W3CLKdqB7g@mail.gmail.com>
In-Reply-To: <CABXB=RT14gofYHkMMr8cj%2BTy2QRUgn6zunho4T2Kq2NxAWmuAQ@mail.gmail.com>
References:  <CABXB=RRSHMhZQFL28eHKjhAYmU87qjpQ=B1=8VRSZoXat9=r5A@mail.gmail.com> <CAM5tNy4sqc18UCZF0vgL%2BXP6vF0wgt_3Yi07yY4wqeuzs6haMA@mail.gmail.com> <CABXB=RSUJ3mpYF5puAm0hSxeavozxyf7Ruab8mPrtBOu6bxM-w@mail.gmail.com> <CAM5tNy52x2s=9Os%2BPAa=-iz7F_o_4_9XxJbRAR28V1v9A4nN6A@mail.gmail.com> <CABXB=RT14gofYHkMMr8cj%2BTy2QRUgn6zunho4T2Kq2NxAWmuAQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Oct 6, 2023 at 10:48=E2=80=AFAM J David <j.david.lists@gmail.com> w=
rote:
>
> On Mon, Oct 2, 2023 at 7:08=E2=80=AFPM Rick Macklem <rick.macklem@gmail.c=
om> wrote:
> > > The nfscbd daemon is not running on any of the clients.
> > >
> > > > If the Linux server still
> > > > issues delegations
> > >
> > > How would I determine that?
> > nfsstat -E -c
> > and then look at the number under "Delegs". It is a current count of
> > delegations, so if it remains 0 over time, no delegations are being iss=
ued.
>
> But if this is done from a client that is not running nfscbd, isn't it
> pretty well guaranteed to be zero?
Yes. If the NFS server is functioning correctly and there is no nfscbd
running, it should remain at 0.

>
> Checking all the clients I can find, "Deleg" is zero on all of them.
> On about half, "DelegRet" is nonzero but small (1-100), but I don't
> know what that is or if it's related.
It implies that the client is doing a recovery after a NFS server reboot
or something like that. Since no delegations are being issued, the DelegRet
operations would/should be related to a recovery after a NFS server reboot.

>
> > I have attached a small patch which should make the NFS client handle
> > this error correctly.
>
> I will look for a way to try this patch, but the clients in this case
> are all managed with freebsd-update and don't have enough disk space
> to build a kernel locally, so it may be tricky.
>
> > > > # tcpdump -s 0 -w out.pcap host <nfs-server-name>
> > > > Let this run for a while and then pull out.pcap into wireshark and =
see what
> > > > traffic is going between the NFS client and server.
> > > > (Unlike tcpdump, wireshark does know how to decode NFS properly.)
> > >
> > > If/when the issue happens again, I will attempt to do this and report=
 back.
>
> I am also working on getting access to Wireshark.
>
> In the interim, it did happen again, so the best I can do is put a
> little bit of tcpdump output here: https://pastebin.com/UDrphwr5 .
This appears to be text. I need the actual pcap file captured by tcpdump.

>
> I can't vouch for "correct" but it does mostly seem to decode the NFS pac=
kets.
>
> It seems to loop the same couple of actions with long delays (15
> seconds) between retries:
>
> This sequence:
> +0.0000s: Client -> server xid 1205841201 getattr fh 0,7/2 ("Getattr"
> in packet body)
> +1.4106s: Client -> server xid 1205841202 getattr fh 0,5/2 ("Renew" in
> packet body)
> +0.0002s: Server -> client xid 1205841202 getattr LNK 12231267145 ids
> 1/53 sz 0 ("Renew" in packet body)
> +3.8001s: Server -> client xid 1205841201 getattr ERROR: Request
> couldn't be completed in time ("Getattr" in packet body)
If the server is not replying to a Getattr, then it is broken. That
will certainly
hang a NFS client.

>
> Repeats after 15 seconds:
> +15.0090s: client -> server 1205841203 getattr fh 0,7/2 ("Getattr" in
> packet body)
> ... etc
>
> The "fh 0,7/2" and "fh 0,5/2" seem to be consistent each time. The xid
> (transaction/request ID?) increments each time.
After 15sec, the client gives up on waiting for the reply from the server,
creates a new TCP connection and tries the RPC again.

Then it looks like the server does not reply again.

If you can give me the pcap file, I will take a look at it, but this
suggests a broken NFS server.

rick

>
> Maybe that will provide a lucky flash of insight in the interim.
>
> Thanks!



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy4P8bneozmjNwjJb=VhjLP0wMhAwJptWOJ1W3CLKdqB7g>