Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 28 Jul 2021 23:07:45 -0700
From:      Kevin Bowling <kevin.bowling@kev009.com>
To:        Alexander Motin <mav@freebsd.org>
Cc:        dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org,  src-committers@freebsd.org
Subject:   Re: git: aefe0a8c32d3 - main - Refactor/optimize cpu_search_*().
Message-ID:  <CAK7dMtBATCR=SRW3MqLQx9e878=wi-d60neCzZLmiRm3k_o8YQ@mail.gmail.com>
In-Reply-To: <202107290200.16T20XOM038857@gitrepo.freebsd.org>
References:  <202107290200.16T20XOM038857@gitrepo.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Jul 28, 2021 at 7:00 PM Alexander Motin <mav@freebsd.org> wrote:

> The branch main has been updated by mav:
>
> URL:
> https://cgit.FreeBSD.org/src/commit/?id=aefe0a8c32d370f2fdd0d0771eb59f8845beda17
>
> commit aefe0a8c32d370f2fdd0d0771eb59f8845beda17
> Author:     Alexander Motin <mav@FreeBSD.org>
> AuthorDate: 2021-07-29 01:18:50 +0000
> Commit:     Alexander Motin <mav@FreeBSD.org>
> CommitDate: 2021-07-29 02:00:29 +0000
>
>     Refactor/optimize cpu_search_*().
>
>     Remove cpu_search_both(), unused for many years.  Without it there is
>     less sense for the trick of compiling common cpu_search() into separate
>     cpu_search_lowest() and cpu_search_highest(), so split them completely,
>     making code more readable.  While there, split iteration over children
>     groups and CPUs, complicating code for very small deduplication.
>
>     Stop passing cpuset_t arguments by value and avoid some manipulations.
>     Since MAXCPU bump from 64 to 256, what was a single register turned
>     into 32-byte memory array, requiring memory allocation and accesses.
>     Splitting struct cpu_search into parameter and result parts allows to
>     even more reduce stack usage, since the first can be passed through
>     on recursion.
>
>     Remove CPU_FFS() from the hot paths, precalculating first and last CPU
>     for each CPU group in advance during initialization.  Again, it was
>     not a problem for 64 CPUs before, but for 256 FFS needs much more code.
>
>     With these changes on 80-thread system doing ~260K uncached ZFS reads
>     per second I observe ~30% reduction of time spent in cpu_search_*().


Nice! I recall seeing contention here on other workloads on high core count
systems.

Regards,
Kevin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAK7dMtBATCR=SRW3MqLQx9e878=wi-d60neCzZLmiRm3k_o8YQ>