From nobody Fri Mar  1 08:00:20 2024
X-Original-To: stable@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TmL9q1mcHz5C8NV
	for <stable@mlmmj.nyi.freebsd.org>; Fri,  1 Mar 2024 08:00:31 +0000 (UTC)
	(envelope-from SRS0=zTW4=KH=klop.ws=ronald-lists@realworks.nl)
Received: from smtp-relay-int-backup.realworks.nl (smtp-relay-int-backup.realworks.nl [87.255.56.188])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(Client did not present a certificate)
	by mx1.freebsd.org (Postfix) with ESMTPS id 4TmL9n4q1Tz4pb9;
	Fri,  1 Mar 2024 08:00:29 +0000 (UTC)
	(envelope-from SRS0=zTW4=KH=klop.ws=ronald-lists@realworks.nl)
Authentication-Results: mx1.freebsd.org;
	dkim=pass header.d=klop.ws header.s=rw2 header.b=pSRAALN+;
	dmarc=pass (policy=quarantine) header.from=klop.ws;
	spf=pass (mx1.freebsd.org: domain of "SRS0=zTW4=KH=klop.ws=ronald-lists@realworks.nl" designates 87.255.56.188 as permitted sender) smtp.mailfrom="SRS0=zTW4=KH=klop.ws=ronald-lists@realworks.nl"
Received: from smtp-relay-int.realworks.nl (rwvirtual375.colo.realworks.nl [10.0.10.75])
	by mailrelayint1.colo2.realworks.nl (Postfix) with ESMTP id 4TmL9d2fctz1Zg;
	Fri,  1 Mar 2024 09:00:21 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=klop.ws; s=rw2;
	t=1709280021;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to; bh=6YLdcB4oAa81IK4ne6Yd37g++Gdb9f2xIaikl8IJM8I=;
	b=pSRAALN+rS44Dh+idXsYqS1/5g1jKlHIYwThYs8d5Rp1DHFUgD8SisEXwlTZZHNNVgv7XS
	7GKQN+XWhT7ZF3drNO9gWE9NqitWlHz0zRNy4j1SoZZ1di+HD0WYZCdIGF/FbRntQJ/lbo
	4WmMCCTz2JmaughjVJsVQ5d/k6Ui4PbB9NzerEHQKJ3yElMcEBUHCOZhLAAopHsOKbjAys
	3kKRklhn8smBwCg7GtlKXiYMNmys1ZvvLTlhta/rui6mm97V/z9/dk9XMJQzLlSFLqjx8I
	rviIFsl3Phe0V6FxjJJCZ0oXLABgTK8nGSJ1YvuwJ9NLDVpbhVLBBL0xw1dT8Q==
Received: from rwvirtual375.colo.realworks.nl (localhost [127.0.0.1])
	by rwvirtual375.colo.realworks.nl (Postfix) with ESMTP id 036DD401A9;
	Fri,  1 Mar 2024 09:00:20 +0100 (CET)
Date: Fri, 1 Mar 2024 09:00:20 +0100 (CET)
From: Ronald Klop <ronald-lists@klop.ws>
To: Rick Macklem <rick.macklem@gmail.com>
Cc: Garrett Wollman <wollman@bimajority.org>, stable@freebsd.org,
	rmacklem@freebsd.org
Message-ID: <1020651467.1592.1709280020993@localhost>
In-Reply-To: <CAM5tNy6v3N-uiULGA0vb_2s0GK1atRh6TYNDGfYMK0PkP46BbQ@mail.gmail.com>
Subject: Re: 13-stable NFS server hang
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-stable
List-Help: <mailto:stable+help@freebsd.org>
List-Post: <mailto:stable@freebsd.org>
List-Subscribe: <mailto:stable+subscribe@freebsd.org>
List-Unsubscribe: <mailto:stable+unsubscribe@freebsd.org>
Sender: owner-freebsd-stable@freebsd.org
X-BeenThere: freebsd-stable@freebsd.org
MIME-Version: 1.0
Content-Type: multipart/alternative; 
	boundary="----=_Part_1591_1207250147.1709280020983"
X-Mailer: Realworks (692.38)
X-Originating-Host: from (localhost [127.0.0.1])
	by rwvirtual375 [10.0.10.75]
	with HTTP; Fri, 01 Mar 2024 09:00:20 +0100
Importance: Normal
X-Priority: 3 (Normal)
X-Spamd-Bar: -
X-Spamd-Result: default: False [-1.70 / 15.00];
	SUSPICIOUS_RECIPS(1.50)[];
	NEURAL_HAM_LONG(-1.00)[-1.000];
	NEURAL_HAM_MEDIUM(-1.00)[-1.000];
	NEURAL_HAM_SHORT(-1.00)[-0.999];
	DMARC_POLICY_ALLOW(-0.50)[klop.ws,quarantine];
	MID_RHS_NOT_FQDN(0.50)[];
	FORGED_SENDER(0.30)[ronald-lists@klop.ws,SRS0=zTW4=KH=klop.ws=ronald-lists@realworks.nl];
	R_SPF_ALLOW(-0.20)[+ip4:87.255.56.128/26];
	R_DKIM_ALLOW(-0.20)[klop.ws:s=rw2];
	MIME_GOOD(-0.10)[multipart/alternative,text/plain];
	ARC_NA(0.00)[];
	ASN(0.00)[asn:38930, ipnet:87.255.32.0/19, country:NL];
	TO_DN_SOME(0.00)[];
	FROM_HAS_DN(0.00)[];
	MIME_TRACE(0.00)[0:+,1:+,2:~];
	HAS_X_PRIO_THREE(0.00)[3];
	TAGGED_RCPT(0.00)[];
	FROM_NEQ_ENVFROM(0.00)[ronald-lists@klop.ws,SRS0=zTW4=KH=klop.ws=ronald-lists@realworks.nl];
	RCVD_TLS_LAST(0.00)[];
	RCVD_COUNT_TWO(0.00)[2];
	TO_MATCH_ENVRCPT_SOME(0.00)[];
	MLMMJ_DEST(0.00)[stable@freebsd.org];
	FREEMAIL_TO(0.00)[gmail.com];
	RCPT_COUNT_THREE(0.00)[4];
	DKIM_TRACE(0.00)[klop.ws:+]
X-Rspamd-Queue-Id: 4TmL9n4q1Tz4pb9

------=_Part_1591_1207250147.1709280020983
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Interesting read. 

 Would it be possible to separate locking for admin actions like a client mounting an fs from traffic flowing for file operations?

Like ongoing file operations could have a read only view/copy of the mount table. Only new operations will have to wait.
But the mount never needs to wait for ongoing operations before locking the structure. 

Just a thought in the morning

Regards,
Ronald.

Van: Rick Macklem <rick.macklem@gmail.com>
Datum: 1 maart 2024 00:31
Aan: Garrett Wollman <wollman@bimajority.org>
CC: stable@freebsd.org, rmacklem@freebsd.org
Onderwerp: Re: 13-stable NFS server hang

> 
> 
> On Wed, Feb 28, 2024 at 4:04PM Rick Macklem  wrote:
> >
> > On Tue, Feb 27, 2024 at 9:30PM Garrett Wollman  wrote:
> > >
> > > Hi, all,
> > >
> > > We've had some complaints of NFS hanging at unpredictable intervals.
> > > Our NFS servers are running a 13-stable from last December, and
> > > tonight I sat in front of the monitor watching `nfsstat -dW`.  I was
> > > able to clearly see that there were periods when NFS activity would
> > > drop *instantly* from 30,000 ops/s to flat zero, which would last
> > > for about 25 seconds before resuming exactly as it was before.
> > >
> > > I wrote a little awk script to watch for this happening and run
> > > `procstat -k` on the nfsd process, and I saw that all but two of the
> > > service threads were idle.  The three nfsd threads that had non-idle
> > > kstacks were:
> > >
> > >   PID    TID COMM                TDNAME              KSTACK
> > >   997 108481 nfsd                nfsd: master        mi_switch sleepq_timedwait _sleep nfsv4_lock nfsrvd_dorpc nfssvc_program svc_run_internal svc_run nfsrvd_nfsd nfssvc_nfsd sys_nfssvc amd64_syscall fast_syscall_common
> > >   997 960918 nfsd                nfsd: service       mi_switch sleepq_timedwait _sleep nfsv4_lock nfsrv_setclient nfsrvd_exchangeid nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline
> > >   997 962232 nfsd                nfsd: service       mi_switch _cv_wait txg_wait_synced_impl txg_wait_synced dmu_offset_next zfs_holey zfs_freebsd_ioctl vn_generic_copy_file_range vop_stdcopy_file_range VOP_COPY_FILE_RANGE vn_copy_file_range nfsrvd_copy_file_range nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline
> > >
> > > I'm suspicious of two things: first, the copy_file_range RPC; second,
> > > the "master" nfsd thread is actually servicing an RPC which requires
> > > obtaining a lock.  The "master" getting stuck while performing client
> > > RPCs is, I believe, the reason NFS service grinds to a halt when a
> > > client tries to write into a near-full filesystem, so this problem
> > > would be more evidence that the dispatching function should not be
> > > mixed with actual operations.  I don't know what the clients are
> > > doing, but is it possible that nfsrvd_copy_file_range is holding a
> > > lock that is needed by one or both of the other two threads?
> > >
> > > Near-term I could change nfsrvd_copy_file_range to just
> > > unconditionally return NFSERR_NOTSUP and force the clients to fall
> > > back, but I figured I would ask if anyone else has seen this.
> > I have attached a little patch that should limit the server's Copy size
> > to vfs.nfsd.maxcopyrange (default of 10Mbytes).
> > Hopefully this makes sure that the Copy does not take too long.
> >
> > You could try this instead of disabling Copy. It would be nice to know if
> > this is suffciient? (If not, I'll probably add a sysctl to disable Copy.)
> I did a quick test without/with this patch,where I copied a 1Gbyte file.
> 
> Without this patch, the Copy RPCs mostly replied in just under 1sec
> (which is what the flag requests), but took over 4sec for one of the Copy
> operations. This implies that one Read/Write of 1Mbyte on the server
> took over 3 seconds.
> I noticed the first Copy did over 600Mbytes, but the rest did about 100Mbytes
> each and it was one of these 100Mbyte Copy operations that took over 4sec.
> 
> With the patch, there were a lot more Copy RPCs (as expected) of 10Mbytes
> each and they took a consistent 0.25-0.3sec to reply. (This is a test of a local
> mount on an old laptop, so nowhere near a server hardware config.)
> 
> So, the patch might be sufficient?
> 
> It would be nice to avoid disabling Copy, since it avoids reading the data
> into the client and then writing it back to the server.
> 
> I will probably commit both patches (10Mbyte clip of Copy size and
> disabling Copy) to main soon, since I cannot say if clipping the size
> of the Copy will always be sufficient.
> 
> Pleas let us know how trying these patches goes, rick
> 
> >
> > rick
> >
> > >
> > > -GAWollman
> > >
> > >
> 
> 
> 
> 
> 
------=_Part_1591_1207250147.1709280020983
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: quoted-printable

<html><head></head><body>Interesting read.&nbsp;<div><br></div><div>&nbsp;W=
ould it be possible to separate locking for admin actions like a client mou=
nting an fs from traffic flowing for file operations?</div><div><br></div><=
div>Like ongoing file operations could have a read only view/copy of the mo=
unt table. Only new operations will have to wait.</div><div>But the mount n=
ever needs to wait for ongoing operations before locking the structure.&nbs=
p;</div><div><br></div><div>Just a thought in the morning</div><div><br></d=
iv><div>Regards,</div><div>Ronald.</div><div><br><p><small><strong>Van:</st=
rong> Rick Macklem &lt;rick.macklem@gmail.com&gt;<br><strong>Datum:</strong=
> 1 maart 2024 00:31<br><strong>Aan:</strong> Garrett Wollman &lt;wollman@b=
imajority.org&gt;<br><strong>CC:</strong> stable@freebsd.org, rmacklem@free=
bsd.org<br><strong>Onderwerp:</strong> Re: 13-stable NFS server hang<br></s=
mall></p><blockquote style=3D"margin-left: 5px; border-left: 3px solid #ccc=
; margin-right: 0px; padding-left: 5px;"><div class=3D"MessageRFC822Viewer =
do_not_remove" id=3D"P"><!-- P -->
<!-- processMimeMessage --><div class=3D"TextPlainViewer do_not_remove" id=
=3D"P.P"><!-- P.P -->On Wed, Feb 28, 2024 at 4:04PM Rick Macklem <rick.mack=
lem@gmail.com> wrote:<br>
&gt;<br>
&gt; On Tue, Feb 27, 2024 at 9:30PM Garrett Wollman <wollman@bimajority.org=
> wrote:<br>
&gt; &gt;<br>
&gt; &gt; Hi, all,<br>
&gt; &gt;<br>
&gt; &gt; We've had some complaints of NFS hanging at unpredictable interva=
ls.<br>
&gt; &gt; Our NFS servers are running a 13-stable from last December, and<b=
r>
&gt; &gt; tonight I sat in front of the monitor watching `nfsstat -dW`. &nb=
sp;I was<br>
&gt; &gt; able to clearly see that there were periods when NFS activity wou=
ld<br>
&gt; &gt; drop *instantly* from 30,000 ops/s to flat zero, which would last=
<br>
&gt; &gt; for about 25 seconds before resuming exactly as it was before.<br=
>
&gt; &gt;<br>
&gt; &gt; I wrote a little awk script to watch for this happening and run<b=
r>
&gt; &gt; `procstat -k` on the nfsd process, and I saw that all but two of =
the<br>
&gt; &gt; service threads were idle. &nbsp;The three nfsd threads that had =
non-idle<br>
&gt; &gt; kstacks were:<br>
&gt; &gt;<br>
&gt; &gt; &nbsp;&nbsp;PID &nbsp;&nbsp;&nbsp;TID COMM &nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;TDNAME =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;KSTACK<br>
&gt; &gt; &nbsp;&nbsp;997 108481 nfsd &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;nfsd: master &nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mi_switch sleepq_timedwait _sleep nfsv4_lo=
ck nfsrvd_dorpc nfssvc_program svc_run_internal svc_run nfsrvd_nfsd nfssvc_=
nfsd sys_nfssvc amd64_syscall fast_syscall_common<br>
&gt; &gt; &nbsp;&nbsp;997 960918 nfsd &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;nfsd: service &nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;mi_switch sleepq_timedwait _sleep nfsv4_lock nf=
srv_setclient nfsrvd_exchangeid nfsrvd_dorpc nfssvc_program svc_run_interna=
l svc_thread_start fork_exit fork_trampoline<br>
&gt; &gt; &nbsp;&nbsp;997 962232 nfsd &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;nfsd: service &nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;mi_switch _cv_wait txg_wait_synced_impl txg_wai=
t_synced dmu_offset_next zfs_holey zfs_freebsd_ioctl vn_generic_copy_file_r=
ange vop_stdcopy_file_range VOP_COPY_FILE_RANGE vn_copy_file_range nfsrvd_c=
opy_file_range nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_star=
t fork_exit fork_trampoline<br>
&gt; &gt;<br>
&gt; &gt; I'm suspicious of two things: first, the copy_file_range RPC; sec=
ond,<br>
&gt; &gt; the "master" nfsd thread is actually servicing an RPC which requi=
res<br>
&gt; &gt; obtaining a lock. &nbsp;The "master" getting stuck while performi=
ng client<br>
&gt; &gt; RPCs is, I believe, the reason NFS service grinds to a halt when =
a<br>
&gt; &gt; client tries to write into a near-full filesystem, so this proble=
m<br>
&gt; &gt; would be more evidence that the dispatching function should not b=
e<br>
&gt; &gt; mixed with actual operations. &nbsp;I don't know what the clients=
 are<br>
&gt; &gt; doing, but is it possible that nfsrvd_copy_file_range is holding =
a<br>
&gt; &gt; lock that is needed by one or both of the other two threads?<br>
&gt; &gt;<br>
&gt; &gt; Near-term I could change nfsrvd_copy_file_range to just<br>
&gt; &gt; unconditionally return NFSERR_NOTSUP and force the clients to fal=
l<br>
&gt; &gt; back, but I figured I would ask if anyone else has seen this.<br>
&gt; I have attached a little patch that should limit the server's Copy siz=
e<br>
&gt; to vfs.nfsd.maxcopyrange (default of 10Mbytes).<br>
&gt; Hopefully this makes sure that the Copy does not take too long.<br>
&gt;<br>
&gt; You could try this instead of disabling Copy. It would be nice to know=
 if<br>
&gt; this is suffciient? (If not, I'll probably add a sysctl to disable Cop=
y.)<br>
I did a quick test without/with this patch,where I copied a 1Gbyte file.<br=
>
<br>
Without this patch, the Copy RPCs mostly replied in just under 1sec<br>
(which is what the flag requests), but took over 4sec for one of the Copy<b=
r>
operations. This implies that one Read/Write of 1Mbyte on the server<br>
took over 3 seconds.<br>
I noticed the first Copy did over 600Mbytes, but the rest did about 100Mbyt=
es<br>
each and it was one of these 100Mbyte Copy operations that took over 4sec.<=
br>
<br>
With the patch, there were a lot more Copy RPCs (as expected) of 10Mbytes<b=
r>
each and they took a consistent 0.25-0.3sec to reply. (This is a test of a =
local<br>
mount on an old laptop, so nowhere near a server hardware config.)<br>
<br>
So, the patch might be sufficient?<br>
<br>
It would be nice to avoid disabling Copy, since it avoids reading the data<=
br>
into the client and then writing it back to the server.<br>
<br>
I will probably commit both patches (10Mbyte clip of Copy size and<br>
disabling Copy) to main soon, since I cannot say if clipping the size<br>
of the Copy will always be sufficient.<br>
<br>
Pleas let us know how trying these patches goes, rick<br>
<br>
&gt;<br>
&gt; rick<br>
&gt;<br>
&gt; &gt;<br>
&gt; &gt; -GAWollman<br>
&gt; &gt;<br>
&gt; &gt;<br>
<br>
</wollman@bimajority.org></rick.macklem@gmail.com></div><!-- TextPlainViewe=
r -->
<hr>
</div><!-- MessageRFC822Viewer -->
</blockquote><br><br><br></div></body></html>
------=_Part_1591_1207250147.1709280020983--