Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 8 Mar 2024 16:46:48 -0800
From:      Rick Macklem <rick.macklem@gmail.com>
To:        Garrett Wollman <wollman@bimajority.org>
Cc:        stable@freebsd.org
Subject:   Re: 13-stable NFS server hang
Message-ID:  <CAM5tNy7jEx3gfYaPNJ5ooACP7-yTLkraRMiBsgtP0Yc87UYwNw@mail.gmail.com>
In-Reply-To: <26090.36102.379531.160926@hergotha.csail.mit.edu>
References:  <26078.50375.679881.64018@hergotha.csail.mit.edu> <CAM5tNy7ZZ2bVLmYnOCWzrS9wq6yudoV5JKG5ObRU0=wLt1ofZw@mail.gmail.com> <26083.64612.717082.366639@hergotha.csail.mit.edu> <26090.36102.379531.160926@hergotha.csail.mit.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Mar 7, 2024 at 7:59=E2=80=AFPM Garrett Wollman <wollman@bimajority.=
org> wrote:
>
> <<On Sat, 2 Mar 2024 23:28:20 -0500, I wrote:
>
> > I believe this explains why vn_copy_file_range sometimes takes much
> > longer than a second: our servers often have lots of data waiting to
> > be written to disk, and if the file being copied was recently modified
> > (and so is dirty), this might take several seconds.  I've set
> > vfs.zfs.dmu_offset_next_sync=3D0 on the server that was hurting the mos=
t
> > and am watching to see if we have more freezes.
>
> > If this does the trick, then I can delay deploying a new kernel until
> > April, after my upcoming vacation.
>
> Since zeroing dmu_offset_next_sync, I've seen about 8000 copy
> operations on the problematic server and no NFS work stoppages due to
> the copy.  I have observed a few others in a similar posture, where
> one client wants to ExchangeID and is waiting for other requests to
> drain, but nothing long enough to cause a service problem.[1]
>
> I think in general this choice to prefer "accurate" but very slow hole
> detection is a poor choice on the part of the OpenZFS developers, but
> so long as we can disable it, I don't think we need to change anything
> in the NFS server itself.
So the question is...
How can this be documented?
In the BUGS section of "man nfsd" maybe.
What do others think?

>  It would be a good idea longer term to
> figure out a lock-free or synchronization-free way of handling these
> client session accept/teardown operations, because it is still a
> performance degradation, just not disruptive enough for users to
> notice.
Yes, as I've noted, it is on my todo list to take a look at it.

Good sleuthing, rick

>
> -GAWollman
>
> [1] Saw one with a slow nfsrv_readdirplus and another with a bunch of
> threads blocked on an upcall to nfsuserd.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy7jEx3gfYaPNJ5ooACP7-yTLkraRMiBsgtP0Yc87UYwNw>