Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 1 Mar 2024 09:00:20 +0100 (CET)
From:      Ronald Klop <ronald-lists@klop.ws>
To:        Rick Macklem <rick.macklem@gmail.com>
Cc:        Garrett Wollman <wollman@bimajority.org>, stable@freebsd.org, rmacklem@freebsd.org
Subject:   Re: 13-stable NFS server hang
Message-ID:  <1020651467.1592.1709280020993@localhost>
In-Reply-To: <CAM5tNy6v3N-uiULGA0vb_2s0GK1atRh6TYNDGfYMK0PkP46BbQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
------=_Part_1591_1207250147.1709280020983
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Interesting read. 

 Would it be possible to separate locking for admin actions like a client mounting an fs from traffic flowing for file operations?

Like ongoing file operations could have a read only view/copy of the mount table. Only new operations will have to wait.
But the mount never needs to wait for ongoing operations before locking the structure. 

Just a thought in the morning

Regards,
Ronald.

Van: Rick Macklem <rick.macklem@gmail.com>
Datum: 1 maart 2024 00:31
Aan: Garrett Wollman <wollman@bimajority.org>
CC: stable@freebsd.org, rmacklem@freebsd.org
Onderwerp: Re: 13-stable NFS server hang

> 
> 
> On Wed, Feb 28, 2024 at 4:04PM Rick Macklem  wrote:
> >
> > On Tue, Feb 27, 2024 at 9:30PM Garrett Wollman  wrote:
> > >
> > > Hi, all,
> > >
> > > We've had some complaints of NFS hanging at unpredictable intervals.
> > > Our NFS servers are running a 13-stable from last December, and
> > > tonight I sat in front of the monitor watching `nfsstat -dW`.  I was
> > > able to clearly see that there were periods when NFS activity would
> > > drop *instantly* from 30,000 ops/s to flat zero, which would last
> > > for about 25 seconds before resuming exactly as it was before.
> > >
> > > I wrote a little awk script to watch for this happening and run
> > > `procstat -k` on the nfsd process, and I saw that all but two of the
> > > service threads were idle.  The three nfsd threads that had non-idle
> > > kstacks were:
> > >
> > >   PID    TID COMM                TDNAME              KSTACK
> > >   997 108481 nfsd                nfsd: master        mi_switch sleepq_timedwait _sleep nfsv4_lock nfsrvd_dorpc nfssvc_program svc_run_internal svc_run nfsrvd_nfsd nfssvc_nfsd sys_nfssvc amd64_syscall fast_syscall_common
> > >   997 960918 nfsd                nfsd: service       mi_switch sleepq_timedwait _sleep nfsv4_lock nfsrv_setclient nfsrvd_exchangeid nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline
> > >   997 962232 nfsd                nfsd: service       mi_switch _cv_wait txg_wait_synced_impl txg_wait_synced dmu_offset_next zfs_holey zfs_freebsd_ioctl vn_generic_copy_file_range vop_stdcopy_file_range VOP_COPY_FILE_RANGE vn_copy_file_range nfsrvd_copy_file_range nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline
> > >
> > > I'm suspicious of two things: first, the copy_file_range RPC; second,
> > > the "master" nfsd thread is actually servicing an RPC which requires
> > > obtaining a lock.  The "master" getting stuck while performing client
> > > RPCs is, I believe, the reason NFS service grinds to a halt when a
> > > client tries to write into a near-full filesystem, so this problem
> > > would be more evidence that the dispatching function should not be
> > > mixed with actual operations.  I don't know what the clients are
> > > doing, but is it possible that nfsrvd_copy_file_range is holding a
> > > lock that is needed by one or both of the other two threads?
> > >
> > > Near-term I could change nfsrvd_copy_file_range to just
> > > unconditionally return NFSERR_NOTSUP and force the clients to fall
> > > back, but I figured I would ask if anyone else has seen this.
> > I have attached a little patch that should limit the server's Copy size
> > to vfs.nfsd.maxcopyrange (default of 10Mbytes).
> > Hopefully this makes sure that the Copy does not take too long.
> >
> > You could try this instead of disabling Copy. It would be nice to know if
> > this is suffciient? (If not, I'll probably add a sysctl to disable Copy.)
> I did a quick test without/with this patch,where I copied a 1Gbyte file.
> 
> Without this patch, the Copy RPCs mostly replied in just under 1sec
> (which is what the flag requests), but took over 4sec for one of the Copy
> operations. This implies that one Read/Write of 1Mbyte on the server
> took over 3 seconds.
> I noticed the first Copy did over 600Mbytes, but the rest did about 100Mbytes
> each and it was one of these 100Mbyte Copy operations that took over 4sec.
> 
> With the patch, there were a lot more Copy RPCs (as expected) of 10Mbytes
> each and they took a consistent 0.25-0.3sec to reply. (This is a test of a local
> mount on an old laptop, so nowhere near a server hardware config.)
> 
> So, the patch might be sufficient?
> 
> It would be nice to avoid disabling Copy, since it avoids reading the data
> into the client and then writing it back to the server.
> 
> I will probably commit both patches (10Mbyte clip of Copy size and
> disabling Copy) to main soon, since I cannot say if clipping the size
> of the Copy will always be sufficient.
> 
> Pleas let us know how trying these patches goes, rick
> 
> >
> > rick
> >
> > >
> > > -GAWollman
> > >
> > >
> 
> 
> 
> 
> 
------=_Part_1591_1207250147.1709280020983
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: quoted-printable

<html><head></head><body>Interesting read.&nbsp;<div><br></div><div>&nbsp;W=
ould it be possible to separate locking for admin actions like a client mou=
nting an fs from traffic flowing for file operations?</div><div><br></div><=
div>Like ongoing file operations could have a read only view/copy of the mo=
unt table. Only new operations will have to wait.</div><div>But the mount n=
ever needs to wait for ongoing operations before locking the structure.&nbs=
p;</div><div><br></div><div>Just a thought in the morning</div><div><br></d=
iv><div>Regards,</div><div>Ronald.</div><div><br><p><small><strong>Van:</st=
rong> Rick Macklem &lt;rick.macklem@gmail.com&gt;<br><strong>Datum:</strong=
> 1 maart 2024 00:31<br><strong>Aan:</strong> Garrett Wollman &lt;wollman@b=
imajority.org&gt;<br><strong>CC:</strong> stable@freebsd.org, rmacklem@free=
bsd.org<br><strong>Onderwerp:</strong> Re: 13-stable NFS server hang<br></s=
mall></p><blockquote style=3D"margin-left: 5px; border-left: 3px solid #ccc=
; margin-right: 0px; padding-left: 5px;"><div class=3D"MessageRFC822Viewer =
do_not_remove" id=3D"P"><!-- P -->
<!-- processMimeMessage --><div class=3D"TextPlainViewer do_not_remove" id=
=3D"P.P"><!-- P.P -->On Wed, Feb 28, 2024 at 4:04PM Rick Macklem <rick.mack=
lem@gmail.com> wrote:<br>
&gt;<br>
&gt; On Tue, Feb 27, 2024 at 9:30PM Garrett Wollman <wollman@bimajority.org=
> wrote:<br>
&gt; &gt;<br>
&gt; &gt; Hi, all,<br>
&gt; &gt;<br>
&gt; &gt; We've had some complaints of NFS hanging at unpredictable interva=
ls.<br>
&gt; &gt; Our NFS servers are running a 13-stable from last December, and<b=
r>
&gt; &gt; tonight I sat in front of the monitor watching `nfsstat -dW`. &nb=
sp;I was<br>
&gt; &gt; able to clearly see that there were periods when NFS activity wou=
ld<br>
&gt; &gt; drop *instantly* from 30,000 ops/s to flat zero, which would last=
<br>
&gt; &gt; for about 25 seconds before resuming exactly as it was before.<br=
>
&gt; &gt;<br>
&gt; &gt; I wrote a little awk script to watch for this happening and run<b=
r>
&gt; &gt; `procstat -k` on the nfsd process, and I saw that all but two of =
the<br>
&gt; &gt; service threads were idle. &nbsp;The three nfsd threads that had =
non-idle<br>
&gt; &gt; kstacks were:<br>
&gt; &gt;<br>
&gt; &gt; &nbsp;&nbsp;PID &nbsp;&nbsp;&nbsp;TID COMM &nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;TDNAME =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;KSTACK<br>
&gt; &gt; &nbsp;&nbsp;997 108481 nfsd &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;nfsd: master &nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mi_switch sleepq_timedwait _sleep nfsv4_lo=
ck nfsrvd_dorpc nfssvc_program svc_run_internal svc_run nfsrvd_nfsd nfssvc_=
nfsd sys_nfssvc amd64_syscall fast_syscall_common<br>
&gt; &gt; &nbsp;&nbsp;997 960918 nfsd &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;nfsd: service &nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;mi_switch sleepq_timedwait _sleep nfsv4_lock nf=
srv_setclient nfsrvd_exchangeid nfsrvd_dorpc nfssvc_program svc_run_interna=
l svc_thread_start fork_exit fork_trampoline<br>
&gt; &gt; &nbsp;&nbsp;997 962232 nfsd &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;nfsd: service &nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;mi_switch _cv_wait txg_wait_synced_impl txg_wai=
t_synced dmu_offset_next zfs_holey zfs_freebsd_ioctl vn_generic_copy_file_r=
ange vop_stdcopy_file_range VOP_COPY_FILE_RANGE vn_copy_file_range nfsrvd_c=
opy_file_range nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_star=
t fork_exit fork_trampoline<br>
&gt; &gt;<br>
&gt; &gt; I'm suspicious of two things: first, the copy_file_range RPC; sec=
ond,<br>
&gt; &gt; the "master" nfsd thread is actually servicing an RPC which requi=
res<br>
&gt; &gt; obtaining a lock. &nbsp;The "master" getting stuck while performi=
ng client<br>
&gt; &gt; RPCs is, I believe, the reason NFS service grinds to a halt when =
a<br>
&gt; &gt; client tries to write into a near-full filesystem, so this proble=
m<br>
&gt; &gt; would be more evidence that the dispatching function should not b=
e<br>
&gt; &gt; mixed with actual operations. &nbsp;I don't know what the clients=
 are<br>
&gt; &gt; doing, but is it possible that nfsrvd_copy_file_range is holding =
a<br>
&gt; &gt; lock that is needed by one or both of the other two threads?<br>
&gt; &gt;<br>
&gt; &gt; Near-term I could change nfsrvd_copy_file_range to just<br>
&gt; &gt; unconditionally return NFSERR_NOTSUP and force the clients to fal=
l<br>
&gt; &gt; back, but I figured I would ask if anyone else has seen this.<br>
&gt; I have attached a little patch that should limit the server's Copy siz=
e<br>
&gt; to vfs.nfsd.maxcopyrange (default of 10Mbytes).<br>
&gt; Hopefully this makes sure that the Copy does not take too long.<br>
&gt;<br>
&gt; You could try this instead of disabling Copy. It would be nice to know=
 if<br>
&gt; this is suffciient? (If not, I'll probably add a sysctl to disable Cop=
y.)<br>
I did a quick test without/with this patch,where I copied a 1Gbyte file.<br=
>
<br>
Without this patch, the Copy RPCs mostly replied in just under 1sec<br>
(which is what the flag requests), but took over 4sec for one of the Copy<b=
r>
operations. This implies that one Read/Write of 1Mbyte on the server<br>
took over 3 seconds.<br>
I noticed the first Copy did over 600Mbytes, but the rest did about 100Mbyt=
es<br>
each and it was one of these 100Mbyte Copy operations that took over 4sec.<=
br>
<br>
With the patch, there were a lot more Copy RPCs (as expected) of 10Mbytes<b=
r>
each and they took a consistent 0.25-0.3sec to reply. (This is a test of a =
local<br>
mount on an old laptop, so nowhere near a server hardware config.)<br>
<br>
So, the patch might be sufficient?<br>
<br>
It would be nice to avoid disabling Copy, since it avoids reading the data<=
br>
into the client and then writing it back to the server.<br>
<br>
I will probably commit both patches (10Mbyte clip of Copy size and<br>
disabling Copy) to main soon, since I cannot say if clipping the size<br>
of the Copy will always be sufficient.<br>
<br>
Pleas let us know how trying these patches goes, rick<br>
<br>
&gt;<br>
&gt; rick<br>
&gt;<br>
&gt; &gt;<br>
&gt; &gt; -GAWollman<br>
&gt; &gt;<br>
&gt; &gt;<br>
<br>
</wollman@bimajority.org></rick.macklem@gmail.com></div><!-- TextPlainViewe=
r -->
<hr>
</div><!-- MessageRFC822Viewer -->
</blockquote><br><br><br></div></body></html>
------=_Part_1591_1207250147.1709280020983--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1020651467.1592.1709280020993>