Date: Fri, 18 Mar 2022 21:18:50 -0400 From: Yoshihiro Ota <ota@j.email.ne.jp> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: freebsd-stable <freebsd-stable@freebsd.org> Subject: Re: nfsd becomes slow when machine CPU usage is at or over 100% on STABLE/13 Message-ID: <20220318211850.67b77d43b3a02043c3819bf3@j.email.ne.jp> In-Reply-To: <YT2PR01MB9730D7B51D325258AAA29828DD0A9@YT2PR01MB9730.CANPRD01.PROD.OUTLOOK.COM> References: <20220309034601.ea3135e31aec3ffb2623f145@j.email.ne.jp> <YT2PR01MB9730D7B51D325258AAA29828DD0A9@YT2PR01MB9730.CANPRD01.PROD.OUTLOOK.COM>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, In short, it looks releng/13.1 doesn't have issues. I haven't confirmed why fully but I'm suspecting debugging option on stable results in such performance penalty. It look a while to build bisect kernels (due to some compile errors) and suspious test results - all of stable kernels seemd to have issues. I had built several versions between releng/13.0 branch point to stable/13 (before releng/13.1 was created) and all of them had such performance degrade. I started suspecting stable debug options and thus built releng/13.1 and tested. I don't see NFS slowdown unlike stable/13. releng/13.0 and releng/12.2 were also fine. Hiro On Wed, 9 Mar 2022 14:39:39 +0000 Rick Macklem <rmacklem@uoguelph.ca> wrote: > Yoshihiro Ota <ota@j.email.ne.jp> wrote: > > Hi, > > > > I'm on stable/13 with latest code base. > > I started testing pre-13.1 branch. > > > > I noticed major performance degrades with NFS when all CPUs are fully > > utilized. > > > > This happends with stable/13 but not releng/13.0 nor releng/12.3. > NFS performance is sensitive to RPC response time. > Since this only happens when the COUs are busy, I'd suspect: > - Kernel thread scheduling changes > or > - Timing of receive socket upcalls (which wake up the nfsd kernel threads). > > I suspect bisecting to the actual commit that causes this is the only way > to find it. > If you know of a working stable/13 that is more recent than 13.0, it would > help. If not, you start at this commit (which did make socket upcall changes): > commit 55cc0a478506ee1c2db7b2f9aadb9855e5490af3 > which was done on May 21, 2021. > > Maybe others can suggest commits related to thread scheduling (which I > know nothing about). > > If you don't have the time/resources to bisect, I doubt this will get resolved. > > Good luck with it, rick > > I had NFS server with above versions and rsynced nfs mount to ufs mount on NFS clients. > My NFS server has 4 cores. > When I had load average of 3 with make buildworld -j3, NFS server was fine. > After adding another 1 load, NFS server throughput came down to about 10% of before. > After taking back to 3 load avg, performance recovered and down again after getting over 4. > Disk was fully avaiable for rsync; buildworld was done on another disk. > > > Someone told me his smbfs was also slow and he suspected TCP/IP regression instead of NFS, by the > way. > > Hiro > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20220318211850.67b77d43b3a02043c3819bf3>