From nobody Sat Mar 19 01:18:50 2022
X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id D8A981A33C63
	for <freebsd-stable@mlmmj.nyi.freebsd.org>; Sat, 19 Mar 2022 01:17:52 +0000 (UTC)
	(envelope-from ota@j.email.ne.jp)
Received: from mail06.asahi-net.or.jp (mail06.asahi-net.or.jp [202.224.55.46])
	by mx1.freebsd.org (Postfix) with ESMTP id 4KL30G5knDz3M8x
	for <freebsd-stable@freebsd.org>; Sat, 19 Mar 2022 01:17:50 +0000 (UTC)
	(envelope-from ota@j.email.ne.jp)
Received: from e20.advok.com (pool-96-225-64-148.nwrknj.fios.verizon.net [96.225.64.148])
	(Authenticated sender: NR2Y-OOT)
	by mail06.asahi-net.or.jp (Postfix) with ESMTPSA id 5DF874D8A2;
	Sat, 19 Mar 2022 10:17:41 +0900 (JST)
Date: Fri, 18 Mar 2022 21:18:50 -0400
From: Yoshihiro Ota <ota@j.email.ne.jp>
To: Rick Macklem <rmacklem@uoguelph.ca>
Cc: freebsd-stable <freebsd-stable@freebsd.org>
Subject: Re: nfsd becomes slow when machine CPU usage is at or over 100% on
 STABLE/13
Message-Id: <20220318211850.67b77d43b3a02043c3819bf3@j.email.ne.jp>
In-Reply-To: <YT2PR01MB9730D7B51D325258AAA29828DD0A9@YT2PR01MB9730.CANPRD01.PROD.OUTLOOK.COM>
References: <20220309034601.ea3135e31aec3ffb2623f145@j.email.ne.jp>
	<YT2PR01MB9730D7B51D325258AAA29828DD0A9@YT2PR01MB9730.CANPRD01.PROD.OUTLOOK.COM>
X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; amd64-portbld-freebsd12.2)
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-stable
List-Help: <mailto:stable+help@freebsd.org>
List-Post: <mailto:stable@freebsd.org>
List-Subscribe: <mailto:stable+subscribe@freebsd.org>
List-Unsubscribe: <mailto:stable+unsubscribe@freebsd.org>
Sender: owner-freebsd-stable@freebsd.org
X-BeenThere: freebsd-stable@freebsd.org
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Rspamd-Queue-Id: 4KL30G5knDz3M8x
X-Spamd-Bar: +
Authentication-Results: mx1.freebsd.org;
	dkim=none;
	dmarc=none;
	spf=pass (mx1.freebsd.org: domain of ota@j.email.ne.jp designates 202.224.55.46 as permitted sender) smtp.mailfrom=ota@j.email.ne.jp
X-Spamd-Result: default: False [1.38 / 15.00];
	 RCVD_VIA_SMTP_AUTH(0.00)[];
	 ARC_NA(0.00)[];
	 FROM_HAS_DN(0.00)[];
	 MV_CASE(0.50)[];
	 R_SPF_ALLOW(-0.20)[+ip4:202.224.55.0/24];
	 MIME_GOOD(-0.10)[text/plain];
	 DMARC_NA(0.00)[email.ne.jp];
	 NEURAL_HAM_LONG(-0.87)[-0.867];
	 NEURAL_SPAM_MEDIUM(1.00)[1.000];
	 NEURAL_SPAM_SHORT(0.95)[0.951];
	 TO_MATCH_ENVRCPT_SOME(0.00)[];
	 TO_DN_ALL(0.00)[];
	 RCPT_COUNT_TWO(0.00)[2];
	 MLMMJ_DEST(0.00)[freebsd-stable];
	 RCVD_NO_TLS_LAST(0.10)[];
	 FROM_EQ_ENVFROM(0.00)[];
	 R_DKIM_NA(0.00)[];
	 MIME_TRACE(0.00)[0:+];
	 ASN(0.00)[asn:4685, ipnet:202.224.32.0/19, country:JP];
	 RCVD_COUNT_TWO(0.00)[2];
	 MID_RHS_MATCH_FROM(0.00)[];
	 RECEIVED_SPAMHAUS_PBL(0.00)[96.225.64.148:received]
X-ThisMailContainsUnwantedMimeParts: N

Hi,

In short, it looks releng/13.1 doesn't have issues.
I haven't confirmed why fully but I'm suspecting debugging option on stable results in such performance penalty.

It look a while to build bisect kernels (due to some compile errors) and suspious test results - all of stable kernels seemd to have issues.

I had built several versions between releng/13.0 branch point to stable/13 (before releng/13.1 was created) and all of them had such performance degrade.

I started suspecting stable debug options and thus built releng/13.1 and tested.
I don't see NFS slowdown unlike stable/13.
releng/13.0 and releng/12.2 were also fine.

Hiro

On Wed, 9 Mar 2022 14:39:39 +0000
Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Yoshihiro Ota <ota@j.email.ne.jp> wrote:
> > Hi,
> >
> > I'm on stable/13 with latest code base.
> > I started testing pre-13.1 branch.
> >
> > I noticed major performance degrades with NFS when all CPUs are fully 
> > utilized.
> >
> > This happends with stable/13 but not releng/13.0 nor releng/12.3.
> NFS performance is sensitive to RPC response time.
> Since this only happens when the COUs are busy, I'd suspect:
> - Kernel thread scheduling changes
> or
> - Timing of receive socket upcalls (which wake up the nfsd kernel threads).
> 
> I suspect bisecting to the actual commit that causes this is the only way
> to find it.
> If you know of a working stable/13 that is more recent than 13.0, it would
> help. If not, you start at this commit (which did make socket upcall changes):
> commit 55cc0a478506ee1c2db7b2f9aadb9855e5490af3
> which was done on May 21, 2021.
> 
> Maybe others can suggest commits related to thread scheduling (which I
> know nothing about).
> 
> If you don't have the time/resources to bisect, I doubt this will get resolved.
> 
> Good luck with it, rick
> 
> I had NFS server with above versions and rsynced nfs mount to ufs mount on NFS clients.
> My NFS server has 4 cores.
> When I had load average of 3 with make buildworld -j3, NFS server was fine.
> After adding another 1 load, NFS server throughput came down to about 10% of before.
> After taking back to 3 load avg, performance recovered and down again after getting over 4.
> Disk was fully avaiable for rsync; buildworld was done on another disk.
> 
> 
> Someone told me his smbfs was also slow and he suspected TCP/IP regression instead of NFS, by the
> way.
> 
> Hiro
> 
>