Date: Wed, 28 Jul 2021 23:07:45 -0700 From: Kevin Bowling <kevin.bowling@kev009.com> To: Alexander Motin <mav@freebsd.org> Cc: dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org, src-committers@freebsd.org Subject: Re: git: aefe0a8c32d3 - main - Refactor/optimize cpu_search_*(). Message-ID: <CAK7dMtBATCR=SRW3MqLQx9e878=wi-d60neCzZLmiRm3k_o8YQ@mail.gmail.com> In-Reply-To: <202107290200.16T20XOM038857@gitrepo.freebsd.org> References: <202107290200.16T20XOM038857@gitrepo.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jul 28, 2021 at 7:00 PM Alexander Motin <mav@freebsd.org> wrote: > The branch main has been updated by mav: > > URL: > https://cgit.FreeBSD.org/src/commit/?id=aefe0a8c32d370f2fdd0d0771eb59f8845beda17 > > commit aefe0a8c32d370f2fdd0d0771eb59f8845beda17 > Author: Alexander Motin <mav@FreeBSD.org> > AuthorDate: 2021-07-29 01:18:50 +0000 > Commit: Alexander Motin <mav@FreeBSD.org> > CommitDate: 2021-07-29 02:00:29 +0000 > > Refactor/optimize cpu_search_*(). > > Remove cpu_search_both(), unused for many years. Without it there is > less sense for the trick of compiling common cpu_search() into separate > cpu_search_lowest() and cpu_search_highest(), so split them completely, > making code more readable. While there, split iteration over children > groups and CPUs, complicating code for very small deduplication. > > Stop passing cpuset_t arguments by value and avoid some manipulations. > Since MAXCPU bump from 64 to 256, what was a single register turned > into 32-byte memory array, requiring memory allocation and accesses. > Splitting struct cpu_search into parameter and result parts allows to > even more reduce stack usage, since the first can be passed through > on recursion. > > Remove CPU_FFS() from the hot paths, precalculating first and last CPU > for each CPU group in advance during initialization. Again, it was > not a problem for 64 CPUs before, but for 256 FFS needs much more code. > > With these changes on 80-thread system doing ~260K uncached ZFS reads > per second I observe ~30% reduction of time spent in cpu_search_*(). Nice! I recall seeing contention here on other workloads on high core count systems. Regards, Kevin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAK7dMtBATCR=SRW3MqLQx9e878=wi-d60neCzZLmiRm3k_o8YQ>