Date: Sun, 30 Nov 2025 16:31:07 -0800 From: Rick Macklem <rick.macklem@gmail.com> To: J David <j.david.lists@gmail.com> Cc: freebsd-fs@freebsd.org Subject: Re: NFSv4.2 hangs on 14.3 Message-ID: <CAM5tNy4QUPjuhwF6oPko3M0uP10YWFZejT6h%2Bgk_2di=cJnW2g@mail.gmail.com> In-Reply-To: <CABXB=RSX0sxD=vAGis156PZzMEu-m4Kd5nQZv-FbogkctkHddQ@mail.gmail.com> References: <CABXB=RQL0tqnE34G6PGLn6AmcwSpapm0-forQZ5vLBQBwcA12Q@mail.gmail.com> <CAM5tNy7eHH7qmTXLRQ9enDAwUzjUXtjugi093eUoRkDbGDCYVQ@mail.gmail.com> <CABXB=RQ6qSNp==Qa_m-=S8cKzxJU2pbuEDjeGfdr7L8Z0=dmGA@mail.gmail.com> <CABXB=RRHz20XwLDCz7qss1=0hXZK-SXz8X7pm4w8o8r2byxH2A@mail.gmail.com> <CAM5tNy6kQMtxe1Sdt_3yQv00ud-xMUsW1m52V2Gn6zy4tnka6Q@mail.gmail.com> <CABXB=RRDABxmgZMadGManyEO3ecy2x-myBZ8bbyjx7UePn%2BcLw@mail.gmail.com> <CAM5tNy65A7QzAS7Ww-dk9Eqx0_xvJAQDPnqEA4D8fWAyB%2BMU2Q@mail.gmail.com> <CABXB=RRH2QkkDiurNWZH8ZeJtCQHBz8XsKg9QjJ7Eg%2BoGSZguA@mail.gmail.com> <CAM5tNy5b7Eda2gwH-H9tzftqRcEsb07to1GD99ZPak4RQ9wYiA@mail.gmail.com> <CABXB=RSX0sxD=vAGis156PZzMEu-m4Kd5nQZv-FbogkctkHddQ@mail.gmail.com>
index | next in thread | previous in thread | raw e-mail
On Sun, Nov 30, 2025 at 3:20 PM J David <j.david.lists@gmail.com> wrote:
>
> On Sun, Nov 30, 2025 at 4:09 PM Rick Macklem <rick.macklem@gmail.com> wrote:
> > Well "Initiate recovery.." means that the server replied with
> > a NFSERR_BADSESSION. This does not normally happen
> > unless the server reboots or the client does something that
> > is normally done upon dismount.
>
> Without knowing which server it's coming from, I don't know how we
> could check that further.
If there are multiple servers mounted, you would have to capture packets
from them all, which will make the pcap file even bigger.
(See below for one possible explanation as to why this is happening.)
>
> > You don't use something the the automounter, do you?
>
> Technically, yes. There is one small server that has a couple of
> directories with shared config information that is rarely consulted
> that the automounter is used for. However, there have not been any
> problems with that that I'm aware of in many years. So if the
> automounter is somehow causing these messages, it's a red herring for
> the server hang issue.
It's not the automounter per se, it is the mounts/dismounts and what
file systems they mount that might be causing your problems.
See below.
>
> > As for "expired locks lost", that means the client has
> > received a NFSERR_EXPIRED reply from a server.
> > (This normally only happens when the client is network
> > partitioned from the server for > 1 minute and, for the
> > FreeBSD server, another client makes a conflicting
> > lock request.)
>
> There's just no evidence that such a thing happened. If the client
> were unable to reach the server for a full minute, there would be all
> kinds of warnings and errors from the client code. (And I would
> probably expect to see the good old "NFS server blahblah not
> responding still trying" message somewhere.)
>
> > All I can say is you shouldn't be seeing what you are seeing
> > from what I know and I can only conjecture that some sort
> > of network partitioning (or maybe repeated mounts/dismounts
> > if you are using an automounter) is causing this?
>
> We do have other repeated mounts/dismounts that aren't caused by the
> automounter.
>
> Some of our NFS servers have code and others have data. The ones that
> hang are the "code" servers, which are continuously mounted.
>
> Mounts are done against the "data" servers as needed. I.e., a job
> comes in, the relevant directory from the data server is mounted, the
> job runs, the directory is unmounted. I won't say we never have any
> problems with that, but it's way less frequent and only hangs the one
> job, whereas these "code" server hangs pretty much take down the whole
> client node.
That's fine for NFSv3. For NFSv4, the design was meant to provide one
mount that a client would use for all NFS access. That is why it can
handle multiple file systems.
--> So long as it is "one at a time", it should be ok.
>
> It might be important to restate that there is currently *no*
> correlation established between the "Initiate recovery" messages and
> our hanging mounts. They may very well be harmless.
They indicate a serious problem, since the servers are not rebooting.
You may have given me a hint when you state that there are mounts/dismounts
being done.
Every time a NFSv4 mount is done to a server, it appears to the
server as a different client. This was done so that there could be
multiple concurrent mounts of the same server, like one that
imitates a NFSv3 environment, which has a separate mount for
each file system.
This should not be a problem, so long as all dismounts are done
non-forced and there is never more than one NFSv4 mount that
covers the same file system at any time.
--> If there is the same server file system under two NFSv4 mounts
on the same client, that would cause problems, since they
appear to be two different clients accessing the same file system
and not the same client.
I'll admit it is a situation that I have never thought of, since in over
30years as a sysadmin I always NFS mounted at boot (in /etc/fstab)
and never dismounted until the client was being shut down.
Because I never thought of such a case, I'll admit there isn't any
documentation warning users to avoid the situation.
(Unfortunately, NFSv4 handles hard state and that hard state must
be tied to a "client". At one point long ago, I did do what the Linux
client does and manage "client state" separately from a mount point,
where multiple mounts can refer to the same client state. The hassle
with this approach is it is much more difficult to do crash recovery,
since it straddles multiple mount points.)
In summary, if your setup ever allows the same server file
system to be visible over more than one mount point in a client
at the same time, it's not going to work properly for NFSv4 mounts.
(NFSv3 mounts do not suffer from this, since there is no open/lock/session
state.)
rick
>
> It's only the "Wrong session" message that is demonstrably highly
> correlated with incidents of hanging mounts.
The patch I posted might stop the hangs, since the "Wrong session"
case appears to be not caused by a broken server in your case.
(Without the patch, it disables session slots, since it thinks the
server is broken. Once you run out of session slots, the mount
will hang. In practice it should never happen and, as I've noted,
there is something in your setup that is breaking things and making
it happen.)
>
> > # tcpdump -s 0 -w out.pcap host <nfs-server>
>
> This is probably not feasible because of the number of servers
> involved and the relative rarity of hangs. For us to get a hang every
> week or two means the individual nodes may go months between hangs.
>
> > Do you have nullfs mounts sitting on top of the NFS mounts?
>
> Yes, the "code" mounts use nullfs mounts, one per job.
>
> > There is a known issue that occurs when nullfs mounts are on
> > top of NFSv4 mounts,
>
> Yes, that came up for us and killed my initial attempt to deploy NFSv4
> in 2020. I thought it was fixed around FreeBSD 13? The OpenOwners
> issue is definitely still there, which requires us to use oneopenown
> which prohibits us from using delegations, but that isn't specific to
> nullfs. Other than that... and this... NFS 4.2 has been pretty good to
> us.
Yes. oneopenown results in fewer open_owners and makes sure there
is only one Open/file. This limits the impact.
rick
>
> Thanks!
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy4QUPjuhwF6oPko3M0uP10YWFZejT6h%2Bgk_2di=cJnW2g>
