FreeBSD Mail Archives

Date:      Mon, 1 Dec 2025 11:39:33 -0500
From:      J David <j.david.lists@gmail.com>
To:        Rick Macklem <rick.macklem@gmail.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: NFSv4.2 hangs on 14.3
Message-ID:  <CABXB=RQbeMTeVkmv8wb9ZpyhphynzLLBn340Fy2Po2OCA%2BxJHA@mail.gmail.com>
In-Reply-To: <CAM5tNy4QUPjuhwF6oPko3M0uP10YWFZejT6h%2Bgk_2di=cJnW2g@mail.gmail.com>
References:  <CABXB=RQL0tqnE34G6PGLn6AmcwSpapm0-forQZ5vLBQBwcA12Q@mail.gmail.com> <CAM5tNy7eHH7qmTXLRQ9enDAwUzjUXtjugi093eUoRkDbGDCYVQ@mail.gmail.com> <CABXB=RQ6qSNp==Qa_m-=S8cKzxJU2pbuEDjeGfdr7L8Z0=dmGA@mail.gmail.com> <CABXB=RRHz20XwLDCz7qss1=0hXZK-SXz8X7pm4w8o8r2byxH2A@mail.gmail.com> <CAM5tNy6kQMtxe1Sdt_3yQv00ud-xMUsW1m52V2Gn6zy4tnka6Q@mail.gmail.com> <CABXB=RRDABxmgZMadGManyEO3ecy2x-myBZ8bbyjx7UePn%2BcLw@mail.gmail.com> <CAM5tNy65A7QzAS7Ww-dk9Eqx0_xvJAQDPnqEA4D8fWAyB%2BMU2Q@mail.gmail.com> <CABXB=RRH2QkkDiurNWZH8ZeJtCQHBz8XsKg9QjJ7Eg%2BoGSZguA@mail.gmail.com> <CAM5tNy5b7Eda2gwH-H9tzftqRcEsb07to1GD99ZPak4RQ9wYiA@mail.gmail.com> <CABXB=RSX0sxD=vAGis156PZzMEu-m4Kd5nQZv-FbogkctkHddQ@mail.gmail.com> <CAM5tNy4QUPjuhwF6oPko3M0uP10YWFZejT6h%2Bgk_2di=cJnW2g@mail.gmail.com>

On Sun, Nov 30, 2025 at 7:31 PM Rick Macklem <rick.macklem@gmail.com> wrote:
> It's not the automounter per se, it is the mounts/dismounts and what
> file systems they mount that might be causing your problems.
> See below.

Yeah the automounter mounts are very rare and guaranteed
nonoverlapping/nonduplicated. So I think it is probably not that but
rather the other thing.

> This should not be a problem, so long as all dismounts are done
> non-forced

Dismounts are non-forced.

> and there is never more than one NFSv4 mount that
> covers the same file system at any time.

This *does* happen, but only for the "data" filesystems, which are not
the ones that are hanging.

The "code" filesystems, the ones that hang, use the nullfs mounts.

To give an incredibly brief an oversimplified/abstracted explanation:

The "code" filesystem is one filesystem mounted
"ro,nfsv4,minorversion=2,tcp,nosuid,noatime,nolockd,noresvport,oneopenown"
at boot as /code with various jail roots under it, like
/code/plumbing-worker, /code/electrical-worker,
/code/plastering-worker, etc.

The "data" filesystems are tens of thousands of subdirectories across
several servers. There is one mount for each server, like /data1,
/data2, /data3, etc. And then there is /data1/job-a, /data1/job-b,
/data2/job-c, etc. Those are mounted
"rw,tcp,nfsv4,minorversion=2,nosuid,noatime."

Now when a notification comes in, "do job-a," the client looks for
/data*/job-a. Then it reads /data1/job-a/job-type to discover that
it's an electrical job.

So the workflow looks like:
0) Wait around for a notification.
1) Receive a notification, "do job-a."
2) Look for "/data*/job-a."
3) Read "/data1/job-a/type" to determine the job type. (Say, "electrical")
4) In some cases: certain trivial jobs are handled directly against
/data1/job-a, in which case do it and return to step 0.
5) Create /workspace/job-a
6) Nullfs mount /code/electrical-worker at /workspace/job-a
7) Mount devfs at /workspace/job-a/dev
8) NFS mount data1:/job-a at /workspace/job-a/data
9) Run /workspace/job-a/do-your-job in a jail rooted at /workspace/job-a
10) Unmounts /workspace/job-a/data
11) Unmounts /workspace/job-a/dev
12) Unmounts /workspace/job-a
13) Deletes /workspace/job-a
14) Go to step 0.

So, in this setup, the data1:/job-a directory is simultaneously
available as /data1/job-a and /workspace/job-a/data and both are
NFSv4.2 mounts. Which sounds like it could trigger "Initiate
recovery..." under ??circumstances??? because that counts as two
clients with the same hostid, or at least two client mounts coming
from the same hostid.

I don't have a lot of control over the algorithm here. I.e., the
"obvious" answer is "change it not to use those /data.../job-...
mounts," but that is likely to take a couple of years to implement, if
it's possible at all. And given that it doesn't *appear* to have any
ill effects, it might be tough to get such a project approved. (In
which case, it would be incredibly helpful if you could outline where
I might look for ill effects we're not aware of.)

We can't convert the /data1, /data2, /data3... mounts to NFSv3 because
they are ZFS on the server and each job is a separate ZFS filesystem,
so /data1 would just show a bunch of empty directories.

We can't convert the /workspace/job-a/data mounts to NFSv3 because
those need locking to work.

We used to use nullfs for the data mounts, but had to stop because of
weird nullfs-over-nfs problems. Should I give that a shot? It
certainly hasn't been a problem with the /code filesystems. Unless
that's what's causing the hangs, which I guess we haven't ruled out.

Currently, the /data1, /data2, /data3... mounts are all rw with
locking. Would the situation improve if I could make them read-only?
(That's nontrivial because step 4 writes output to the job directory,
but I *might* be able to work around that.)

If there's anything else (under my control) that I could do in the NFS
configuration to improve the situation, I'm certainly open to trying.

Any ideas?

Thanks!

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABXB=RQbeMTeVkmv8wb9ZpyhphynzLLBn340Fy2Po2OCA%2BxJHA>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation