FreeBSD Mail Archives

Date:      Sat, 11 Jul 2015 11:29:19 -0500
From:      Alan Cox <alc@rice.edu>
To:        Adrian Chadd <adrian@FreeBSD.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r285387 - in head: lib/libc/sys share/man/man4 sys/conf sys/kern sys/sys sys/vm usr.bin usr.bin/numactl
Message-ID:  <55A1445F.50901@rice.edu>
In-Reply-To: <201507111521.t6BFLcrv039934@repo.freebsd.org>
References:  <201507111521.t6BFLcrv039934@repo.freebsd.org>


On 07/11/2015 10:21, Adrian Chadd wrote:
> Author: adrian
> Date: Sat Jul 11 15:21:37 2015
> New Revision: 285387
> URL: https://svnweb.freebsd.org/changeset/base/285387
>
> Log:
>   Add an initial NUMA affinity/policy configuration for threads and processes.
>   
>   This is based on work done by jeff@ and jhb@, as well as the numa.diff
>   patch that has been circulating when someone asks for first-touch NUMA
>   on -10 or -11.
>   
>   * Introduce a simple set of VM policy and iterator types.
>   * tie the policy types into the vm_phys path for now, mirroring how
>     the initial first-touch allocation work was enabled.
>   * add syscalls to control changing thread and process defaults.
>   * add a global NUMA VM domain policy.
>   * implement a simple cascade policy order - if a thread policy exists, use it;
>     if a process policy exists, use it; use the default policy.
>   * processes inherit policies from their parent processes, threads inherit
>     policies from their parent threads.
>   * add a simple tool (numactl) to query and modify default thread/process
>     policities.
>   * add documentation for the new syscalls, for numa and for numactl.
>   * re-enable first touch NUMA again by default, as now policies can be
>     set in a variety of methods.
>   
>   This is only relevant for very specific workloads.
>   
>   This doesn't pretend to be a final NUMA solution.
>   
>   The previous defaults in -HEAD (with MAXMEMDOM set) can be achieved by
>   'sysctl vm.default_policy=rr'.
>   
>   This is only relevant if MAXMEMDOM is set to something other than 1.
>   Ie, if you're using GENERIC or a modified kernel with non-NUMA, then
>   this is a glorified no-op for you.
>   
>   Thank you to Norse Corp for giving me access to rather large
>   (for FreeBSD!) NUMA machines in order to develop and verify this.
>   
>   Thank you to Dell for providing me with dual socket sandybridge
>   and westmere v3 hardware to do NUMA development with.
>   
>   Thank you to Scott Long at Netflix for providing me with access
>   to the two-socket, four-domain haswell v3 hardware.
>   
>   Thank you to Peter Holm for running the stress testing suite
>   against the NUMA branch during various stages of development!
>   
>   Tested:
>   
>   * MIPS (regression testing; non-NUMA)
>   * i386 (regression testing; non-NUMA GENERIC)
>   * amd64 (regression testing; non-NUMA GENERIC)
>   * westmere, 2 socket (thankyou norse!)
>   * sandy bridge, 2 socket (thankyou dell!)
>   * ivy bridge, 2 socket (thankyou norse!)
>   * westmere-EX, 4 socket / 1TB RAM (thankyou norse!)
>   * haswell, 2 socket (thankyou norse!)
>   * haswell v3, 2 socket (thankyou dell)
>   * haswell v3, 2x18 core (thankyou scott long / netflix!)
>   
>   * Peter Holm ran a stress test suite on this work and found one
>     issue, but has not been able to verify it (it doesn't look NUMA
>     related, and he only saw it once over many testing runs.)
>   
>   * I've tested bhyve instances running in fixed NUMA domains and cpusets;
>     all seems to work correctly.
>   
>   Verified:
>   
>   * intel-pcm - pcm-numa.x and pcm-memory.x, whilst selecting different
>     NUMA policies for processes under test.
>   
>   Review:
>   
>   This was reviewed through phabricator (https://reviews.freebsd.org/D2559)
>   as well as privately and via emails to freebsd-arch@.  The git history
>   with specific attributes is available at https://github.com/erikarn/freebsd/
>   in the NUMA branch (https://github.com/erikarn/freebsd/compare/local/adrian_numa_policy).
>   
>   This has been reviewed by a number of people (stas, rpaulo, kib, ngie,
>   wblock) but not achieved a clear consensus.  My hope is that with further
>   exposure and testing more functionality can be implemented and evaluated.
>   
>   Notes:
>   
>   * The VM doesn't handle unbalanced domains very well, and if you have an overly
>     unbalanced memory setup whilst under high memory pressure, VM page allocation
>     may fail leading to a kernel panic.  This was a problem in the past, but it's
>     much more easily triggered now with these tools.
>   


For the record, no, it doesn't panic.  Both the first-touch scheme in
9.x and the round-robin scheme in 10.x fall back to allocating from a
different domain until some page is found.

Alan

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55A1445F.50901>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation