Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 3 Mar 2024 13:17:30 -0800
From:      Rick Macklem <rick.macklem@gmail.com>
To:        Garrett Wollman <wollman@bimajority.org>
Cc:        stable@freebsd.org
Subject:   Re: 13-stable NFS server hang
Message-ID:  <CAM5tNy4BM3fwccjF53ROP-7NojsWMM2fUY2_RA-4GMWfc6Sn4g@mail.gmail.com>
In-Reply-To: <26083.64612.717082.366639@hergotha.csail.mit.edu>
References:  <26078.50375.679881.64018@hergotha.csail.mit.edu> <CAM5tNy7ZZ2bVLmYnOCWzrS9wq6yudoV5JKG5ObRU0=wLt1ofZw@mail.gmail.com> <26083.64612.717082.366639@hergotha.csail.mit.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Mar 2, 2024 at 8:28=E2=80=AFPM Garrett Wollman <wollman@bimajority.=
org> wrote:
>
>
> I wrote previously:
> > PID    TID COMM                TDNAME              KSTACK
> > 997 108481 nfsd                nfsd: master        mi_switch sleepq_tim=
edwait _sleep nfsv4_lock nfsrvd_dorpc nfssvc_program svc_run_internal svc_r=
un nfsrvd_nfsd nfssvc_nfsd sys_nfssvc amd64_syscall fast_syscall_common
> > 997 960918 nfsd                nfsd: service       mi_switch sleepq_tim=
edwait _sleep nfsv4_lock nfsrv_setclient nfsrvd_exchangeid nfsrvd_dorpc nfs=
svc_program svc_run_internal svc_thread_start fork_exit fork_trampoline
> > 997 962232 nfsd                nfsd: service       mi_switch _cv_wait t=
xg_wait_synced_impl txg_wait_synced dmu_offset_next zfs_holey zfs_freebsd_i=
octl vn_generic_copy_file_range vop_stdcopy_file_range VOP_COPY_FILE_RANGE =
vn_copy_file_range nfsrvd_copy_file_range nfsrvd_dorpc nfssvc_program svc_r=
un_internal svc_thread_start fork_exit fork_trampoline
>
> I spent some time this evening looking at this last stack trace, and
> stumbled across the following comment in
> sys/contrib/openzfs/module/zfs/dmu.c:
>
> | /*
> |  * Enable/disable forcing txg sync when dirty checking for holes with l=
seek().
> |  * By default this is enabled to ensure accurate hole reporting, it can=
 result
> |  * in a significant performance penalty for lseek(SEEK_HOLE) heavy work=
loads.
> |  * Disabling this option will result in holes never being reported in d=
irty
> |  * files which is always safe.
> |  */
> | int zfs_dmu_offset_next_sync =3D 1;
>
> I believe this explains why vn_copy_file_range sometimes takes much
> longer than a second: our servers often have lots of data waiting to
> be written to disk, and if the file being copied was recently modified
> (and so is dirty), this might take several seconds.  I've set
> vfs.zfs.dmu_offset_next_sync=3D0 on the server that was hurting the most
> and am watching to see if we have more freezes.
>
> If this does the trick, then I can delay deploying a new kernel until
> April, after my upcoming vacation.
Interesting. Please let us know how it goes.

And enjoy your vacation, rick

>
> -GAWollman
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy4BM3fwccjF53ROP-7NojsWMM2fUY2_RA-4GMWfc6Sn4g>