Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 21 Apr 2015 20:03:59 -0700
From:      Adrian Chadd <adrian@freebsd.org>
To:        "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   Re: RFT: numa policy branch
Message-ID:  <CAJ-VmokPd=CUAfqmjWPns%2Bpj6zKbpF55tDn2_u8JPNzaK7F1Pw@mail.gmail.com>
In-Reply-To: <CAJ-VmomL9hZZHPtZ3%2BTdujHmo5UQfFhm59vQKUbxW%2B%2B-TGobmg@mail.gmail.com>
References:  <CAJ-VmomL9hZZHPtZ3%2BTdujHmo5UQfFhm59vQKUbxW%2B%2B-TGobmg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
OH, and the branch:

https://github.com/erikarn/freebsd/tree/local/adrian_numa_policy


On 21 April 2015 at 19:42, Adrian Chadd <adrian@freebsd.org> wrote:
> Hi!
>
> I have a branch off of -HEAD that implements the bare minimum for
> default, per-thread, per-process NUMA allocation policies and
> associated syscalls / tool to manipulate it.
>
> You can all thank Norse for providing me with kit to test this on
> (including a Dell R910, which is a quad-socket 40-core, 80-thread
> westmere-EX box with ~1TB of RAM) and time to do the work, and Dell
> for loaning me way too much hardware to make this happen.
>
> It's not ready for formal review for commit (hence why this is a
> "RFT") but it works well enough in my local test setup that I think
> it's worth sharing.
>
> What it does:
>
> * adds VM domain policy and iterator types;
> * the system default policy is "first-touch-round-robin", which is
> "first-touch, and if fail, round-robin to other domains";
> * there's per-proc and per-thread policy entries in struct proc /
> struct thread - enough to play with, but certainly not in its final
> form;
> * two syscalls - numa_setaffinity() and numa_getaffinity();
> * a very basic numactl program, complete with adrian-standard "MAN=".
>
> This doesn't teach ULE or the proc/thread stuff anything about NUMA
> /scheduling/. That's a whole different ballgame. It also has nothing
> to do with kernel memory allocation - no ULE, no contigmalloc, no
> driver affinity, etc. This is purely for controlling the initial page
> allocation for processes - which for a lot of NUMA workloads is all it
> needs.
>
> How to use:
>
> * look at the NUMA config file. You have to add in memory domain
> support or you won't get the domains setup;
> * sysctl vm.default_domain controls the default policy. "rr",
> "first-touch-rr" and "first-touch" are supported here.
> * numactl (--tid=tid or --pid=pid) --policy=policy, --domain=domain,
> (--get or --set) (optional command) - like cpuset
>
> So, some examples:
>
> numactl --pid=1 --get
>
> Get the current policy for the given PID:
>
> # ./numactl --pid=1 --get
>   Policy: none; domain: -1
>
> Run a job with a fixed-domain allocation from domain 1, but pinned to
> CPU 0 (which on my system is in domain 0, so it's 100% remote memory
> access):
>
> $ cpuset -l 0 ./numactl --policy=fixed-domain --domain=1 ~/himenobmtxpa xl 0
>
> Run a job with round-robin:
>
> $ cpuset -l 0 ./numactl --policy=rr ~/himenobmtxpa xl 0
>
> I'm using the 'pcm-numa.x' tool from the intel-pcm package to ensure
> that memory accesses are correctly local/remote/round-robin as
> appropriate.
>
> I'd appreciate feedback and any improvements (yes, including a
> manpage) that people have.
>
> Thanks!
>
>
>
> -adrian



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-VmokPd=CUAfqmjWPns%2Bpj6zKbpF55tDn2_u8JPNzaK7F1Pw>