Date: Fri, 1 Mar 2024 09:00:20 +0100 (CET) From: Ronald Klop <ronald-lists@klop.ws> To: Rick Macklem <rick.macklem@gmail.com> Cc: Garrett Wollman <wollman@bimajority.org>, stable@freebsd.org, rmacklem@freebsd.org Subject: Re: 13-stable NFS server hang Message-ID: <1020651467.1592.1709280020993@localhost> In-Reply-To: <CAM5tNy6v3N-uiULGA0vb_2s0GK1atRh6TYNDGfYMK0PkP46BbQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
------=_Part_1591_1207250147.1709280020983 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Interesting read. Would it be possible to separate locking for admin actions like a client mounting an fs from traffic flowing for file operations? Like ongoing file operations could have a read only view/copy of the mount table. Only new operations will have to wait. But the mount never needs to wait for ongoing operations before locking the structure. Just a thought in the morning Regards, Ronald. Van: Rick Macklem <rick.macklem@gmail.com> Datum: 1 maart 2024 00:31 Aan: Garrett Wollman <wollman@bimajority.org> CC: stable@freebsd.org, rmacklem@freebsd.org Onderwerp: Re: 13-stable NFS server hang > > > On Wed, Feb 28, 2024 at 4:04PM Rick Macklem wrote: > > > > On Tue, Feb 27, 2024 at 9:30PM Garrett Wollman wrote: > > > > > > Hi, all, > > > > > > We've had some complaints of NFS hanging at unpredictable intervals. > > > Our NFS servers are running a 13-stable from last December, and > > > tonight I sat in front of the monitor watching `nfsstat -dW`. I was > > > able to clearly see that there were periods when NFS activity would > > > drop *instantly* from 30,000 ops/s to flat zero, which would last > > > for about 25 seconds before resuming exactly as it was before. > > > > > > I wrote a little awk script to watch for this happening and run > > > `procstat -k` on the nfsd process, and I saw that all but two of the > > > service threads were idle. The three nfsd threads that had non-idle > > > kstacks were: > > > > > > PID TID COMM TDNAME KSTACK > > > 997 108481 nfsd nfsd: master mi_switch sleepq_timedwait _sleep nfsv4_lock nfsrvd_dorpc nfssvc_program svc_run_internal svc_run nfsrvd_nfsd nfssvc_nfsd sys_nfssvc amd64_syscall fast_syscall_common > > > 997 960918 nfsd nfsd: service mi_switch sleepq_timedwait _sleep nfsv4_lock nfsrv_setclient nfsrvd_exchangeid nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline > > > 997 962232 nfsd nfsd: service mi_switch _cv_wait txg_wait_synced_impl txg_wait_synced dmu_offset_next zfs_holey zfs_freebsd_ioctl vn_generic_copy_file_range vop_stdcopy_file_range VOP_COPY_FILE_RANGE vn_copy_file_range nfsrvd_copy_file_range nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline > > > > > > I'm suspicious of two things: first, the copy_file_range RPC; second, > > > the "master" nfsd thread is actually servicing an RPC which requires > > > obtaining a lock. The "master" getting stuck while performing client > > > RPCs is, I believe, the reason NFS service grinds to a halt when a > > > client tries to write into a near-full filesystem, so this problem > > > would be more evidence that the dispatching function should not be > > > mixed with actual operations. I don't know what the clients are > > > doing, but is it possible that nfsrvd_copy_file_range is holding a > > > lock that is needed by one or both of the other two threads? > > > > > > Near-term I could change nfsrvd_copy_file_range to just > > > unconditionally return NFSERR_NOTSUP and force the clients to fall > > > back, but I figured I would ask if anyone else has seen this. > > I have attached a little patch that should limit the server's Copy size > > to vfs.nfsd.maxcopyrange (default of 10Mbytes). > > Hopefully this makes sure that the Copy does not take too long. > > > > You could try this instead of disabling Copy. It would be nice to know if > > this is suffciient? (If not, I'll probably add a sysctl to disable Copy.) > I did a quick test without/with this patch,where I copied a 1Gbyte file. > > Without this patch, the Copy RPCs mostly replied in just under 1sec > (which is what the flag requests), but took over 4sec for one of the Copy > operations. This implies that one Read/Write of 1Mbyte on the server > took over 3 seconds. > I noticed the first Copy did over 600Mbytes, but the rest did about 100Mbytes > each and it was one of these 100Mbyte Copy operations that took over 4sec. > > With the patch, there were a lot more Copy RPCs (as expected) of 10Mbytes > each and they took a consistent 0.25-0.3sec to reply. (This is a test of a local > mount on an old laptop, so nowhere near a server hardware config.) > > So, the patch might be sufficient? > > It would be nice to avoid disabling Copy, since it avoids reading the data > into the client and then writing it back to the server. > > I will probably commit both patches (10Mbyte clip of Copy size and > disabling Copy) to main soon, since I cannot say if clipping the size > of the Copy will always be sufficient. > > Pleas let us know how trying these patches goes, rick > > > > > rick > > > > > > > > -GAWollman > > > > > > > > > > > ------=_Part_1591_1207250147.1709280020983 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: quoted-printable <html><head></head><body>Interesting read. <div><br></div><div> W= ould it be possible to separate locking for admin actions like a client mou= nting an fs from traffic flowing for file operations?</div><div><br></div><= div>Like ongoing file operations could have a read only view/copy of the mo= unt table. Only new operations will have to wait.</div><div>But the mount n= ever needs to wait for ongoing operations before locking the structure.&nbs= p;</div><div><br></div><div>Just a thought in the morning</div><div><br></d= iv><div>Regards,</div><div>Ronald.</div><div><br><p><small><strong>Van:</st= rong> Rick Macklem <rick.macklem@gmail.com><br><strong>Datum:</strong= > 1 maart 2024 00:31<br><strong>Aan:</strong> Garrett Wollman <wollman@b= imajority.org><br><strong>CC:</strong> stable@freebsd.org, rmacklem@free= bsd.org<br><strong>Onderwerp:</strong> Re: 13-stable NFS server hang<br></s= mall></p><blockquote style=3D"margin-left: 5px; border-left: 3px solid #ccc= ; margin-right: 0px; padding-left: 5px;"><div class=3D"MessageRFC822Viewer = do_not_remove" id=3D"P"><!-- P --> <!-- processMimeMessage --><div class=3D"TextPlainViewer do_not_remove" id= =3D"P.P"><!-- P.P -->On Wed, Feb 28, 2024 at 4:04PM Rick Macklem <rick.mack= lem@gmail.com> wrote:<br> ><br> > On Tue, Feb 27, 2024 at 9:30PM Garrett Wollman <wollman@bimajority.org= > wrote:<br> > ><br> > > Hi, all,<br> > ><br> > > We've had some complaints of NFS hanging at unpredictable interva= ls.<br> > > Our NFS servers are running a 13-stable from last December, and<b= r> > > tonight I sat in front of the monitor watching `nfsstat -dW`. &nb= sp;I was<br> > > able to clearly see that there were periods when NFS activity wou= ld<br> > > drop *instantly* from 30,000 ops/s to flat zero, which would last= <br> > > for about 25 seconds before resuming exactly as it was before.<br= > > ><br> > > I wrote a little awk script to watch for this happening and run<b= r> > > `procstat -k` on the nfsd process, and I saw that all but two of = the<br> > > service threads were idle. The three nfsd threads that had = non-idle<br> > > kstacks were:<br> > ><br> > > PID TID COMM &nbs= p; TDNAME = &nb= sp;KSTACK<br> > > 997 108481 nfsd &= nbsp; nfsd: master &nb= sp; mi_switch sleepq_timedwait _sleep nfsv4_lo= ck nfsrvd_dorpc nfssvc_program svc_run_internal svc_run nfsrvd_nfsd nfssvc_= nfsd sys_nfssvc amd64_syscall fast_syscall_common<br> > > 997 960918 nfsd &= nbsp; nfsd: service &n= bsp; mi_switch sleepq_timedwait _sleep nfsv4_lock nf= srv_setclient nfsrvd_exchangeid nfsrvd_dorpc nfssvc_program svc_run_interna= l svc_thread_start fork_exit fork_trampoline<br> > > 997 962232 nfsd &= nbsp; nfsd: service &n= bsp; mi_switch _cv_wait txg_wait_synced_impl txg_wai= t_synced dmu_offset_next zfs_holey zfs_freebsd_ioctl vn_generic_copy_file_r= ange vop_stdcopy_file_range VOP_COPY_FILE_RANGE vn_copy_file_range nfsrvd_c= opy_file_range nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_star= t fork_exit fork_trampoline<br> > ><br> > > I'm suspicious of two things: first, the copy_file_range RPC; sec= ond,<br> > > the "master" nfsd thread is actually servicing an RPC which requi= res<br> > > obtaining a lock. The "master" getting stuck while performi= ng client<br> > > RPCs is, I believe, the reason NFS service grinds to a halt when = a<br> > > client tries to write into a near-full filesystem, so this proble= m<br> > > would be more evidence that the dispatching function should not b= e<br> > > mixed with actual operations. I don't know what the clients= are<br> > > doing, but is it possible that nfsrvd_copy_file_range is holding = a<br> > > lock that is needed by one or both of the other two threads?<br> > ><br> > > Near-term I could change nfsrvd_copy_file_range to just<br> > > unconditionally return NFSERR_NOTSUP and force the clients to fal= l<br> > > back, but I figured I would ask if anyone else has seen this.<br> > I have attached a little patch that should limit the server's Copy siz= e<br> > to vfs.nfsd.maxcopyrange (default of 10Mbytes).<br> > Hopefully this makes sure that the Copy does not take too long.<br> ><br> > You could try this instead of disabling Copy. It would be nice to know= if<br> > this is suffciient? (If not, I'll probably add a sysctl to disable Cop= y.)<br> I did a quick test without/with this patch,where I copied a 1Gbyte file.<br= > <br> Without this patch, the Copy RPCs mostly replied in just under 1sec<br> (which is what the flag requests), but took over 4sec for one of the Copy<b= r> operations. This implies that one Read/Write of 1Mbyte on the server<br> took over 3 seconds.<br> I noticed the first Copy did over 600Mbytes, but the rest did about 100Mbyt= es<br> each and it was one of these 100Mbyte Copy operations that took over 4sec.<= br> <br> With the patch, there were a lot more Copy RPCs (as expected) of 10Mbytes<b= r> each and they took a consistent 0.25-0.3sec to reply. (This is a test of a = local<br> mount on an old laptop, so nowhere near a server hardware config.)<br> <br> So, the patch might be sufficient?<br> <br> It would be nice to avoid disabling Copy, since it avoids reading the data<= br> into the client and then writing it back to the server.<br> <br> I will probably commit both patches (10Mbyte clip of Copy size and<br> disabling Copy) to main soon, since I cannot say if clipping the size<br> of the Copy will always be sufficient.<br> <br> Pleas let us know how trying these patches goes, rick<br> <br> ><br> > rick<br> ><br> > ><br> > > -GAWollman<br> > ><br> > ><br> <br> </wollman@bimajority.org></rick.macklem@gmail.com></div><!-- TextPlainViewe= r --> <hr> </div><!-- MessageRFC822Viewer --> </blockquote><br><br><br></div></body></html> ------=_Part_1591_1207250147.1709280020983--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1020651467.1592.1709280020993>