Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 3 Mar 2024 15:27:04 -0800
From:      Rick Macklem <rick.macklem@gmail.com>
To:        Garrett Wollman <wollman@bimajority.org>
Cc:        stable@freebsd.org
Subject:   Re: 13-stable NFS server hang
Message-ID:  <CAM5tNy47qzCSCxUik3LyV=VtpYGgaLoWehP4AeJCXz0ik0JGaw@mail.gmail.com>
In-Reply-To: <CAM5tNy4BM3fwccjF53ROP-7NojsWMM2fUY2_RA-4GMWfc6Sn4g@mail.gmail.com>
References:  <26078.50375.679881.64018@hergotha.csail.mit.edu> <CAM5tNy7ZZ2bVLmYnOCWzrS9wq6yudoV5JKG5ObRU0=wLt1ofZw@mail.gmail.com> <26083.64612.717082.366639@hergotha.csail.mit.edu> <CAM5tNy4BM3fwccjF53ROP-7NojsWMM2fUY2_RA-4GMWfc6Sn4g@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Mar 3, 2024 at 1:17=E2=80=AFPM Rick Macklem <rick.macklem@gmail.com=
> wrote:
>
> On Sat, Mar 2, 2024 at 8:28=E2=80=AFPM Garrett Wollman <wollman@bimajorit=
y.org> wrote:
> >
> >
> > I wrote previously:
> > > PID    TID COMM                TDNAME              KSTACK
> > > 997 108481 nfsd                nfsd: master        mi_switch sleepq_t=
imedwait _sleep nfsv4_lock nfsrvd_dorpc nfssvc_program svc_run_internal svc=
_run nfsrvd_nfsd nfssvc_nfsd sys_nfssvc amd64_syscall fast_syscall_common
> > > 997 960918 nfsd                nfsd: service       mi_switch sleepq_t=
imedwait _sleep nfsv4_lock nfsrv_setclient nfsrvd_exchangeid nfsrvd_dorpc n=
fssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline
> > > 997 962232 nfsd                nfsd: service       mi_switch _cv_wait=
 txg_wait_synced_impl txg_wait_synced dmu_offset_next zfs_holey zfs_freebsd=
_ioctl vn_generic_copy_file_range vop_stdcopy_file_range VOP_COPY_FILE_RANG=
E vn_copy_file_range nfsrvd_copy_file_range nfsrvd_dorpc nfssvc_program svc=
_run_internal svc_thread_start fork_exit fork_trampoline
> >
> > I spent some time this evening looking at this last stack trace, and
> > stumbled across the following comment in
> > sys/contrib/openzfs/module/zfs/dmu.c:
> >
> > | /*
> > |  * Enable/disable forcing txg sync when dirty checking for holes with=
 lseek().
> > |  * By default this is enabled to ensure accurate hole reporting, it c=
an result
> > |  * in a significant performance penalty for lseek(SEEK_HOLE) heavy wo=
rkloads.
> > |  * Disabling this option will result in holes never being reported in=
 dirty
> > |  * files which is always safe.
> > |  */
> > | int zfs_dmu_offset_next_sync =3D 1;
> >
> > I believe this explains why vn_copy_file_range sometimes takes much
> > longer than a second: our servers often have lots of data waiting to
> > be written to disk, and if the file being copied was recently modified
> > (and so is dirty), this might take several seconds.  I've set
> > vfs.zfs.dmu_offset_next_sync=3D0 on the server that was hurting the mos=
t
> > and am watching to see if we have more freezes.
> >
> > If this does the trick, then I can delay deploying a new kernel until
> > April, after my upcoming vacation.
> Interesting. Please let us know how it goes.
Btw, I just tried this for my trivial test and it worked very well.
A 1Gbyte file was cpied in two Copy RPCs of 1sec and slightly less than
1sec.

So, your vacation may be looking better, rick

>
> And enjoy your vacation, rick
>
> >
> > -GAWollman
> >



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy47qzCSCxUik3LyV=VtpYGgaLoWehP4AeJCXz0ik0JGaw>