Date: Tue, 21 Apr 2015 20:03:59 -0700 From: Adrian Chadd <adrian@freebsd.org> To: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org> Subject: Re: RFT: numa policy branch Message-ID: <CAJ-VmokPd=CUAfqmjWPns%2Bpj6zKbpF55tDn2_u8JPNzaK7F1Pw@mail.gmail.com> In-Reply-To: <CAJ-VmomL9hZZHPtZ3%2BTdujHmo5UQfFhm59vQKUbxW%2B%2B-TGobmg@mail.gmail.com> References: <CAJ-VmomL9hZZHPtZ3%2BTdujHmo5UQfFhm59vQKUbxW%2B%2B-TGobmg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
OH, and the branch: https://github.com/erikarn/freebsd/tree/local/adrian_numa_policy On 21 April 2015 at 19:42, Adrian Chadd <adrian@freebsd.org> wrote: > Hi! > > I have a branch off of -HEAD that implements the bare minimum for > default, per-thread, per-process NUMA allocation policies and > associated syscalls / tool to manipulate it. > > You can all thank Norse for providing me with kit to test this on > (including a Dell R910, which is a quad-socket 40-core, 80-thread > westmere-EX box with ~1TB of RAM) and time to do the work, and Dell > for loaning me way too much hardware to make this happen. > > It's not ready for formal review for commit (hence why this is a > "RFT") but it works well enough in my local test setup that I think > it's worth sharing. > > What it does: > > * adds VM domain policy and iterator types; > * the system default policy is "first-touch-round-robin", which is > "first-touch, and if fail, round-robin to other domains"; > * there's per-proc and per-thread policy entries in struct proc / > struct thread - enough to play with, but certainly not in its final > form; > * two syscalls - numa_setaffinity() and numa_getaffinity(); > * a very basic numactl program, complete with adrian-standard "MAN=". > > This doesn't teach ULE or the proc/thread stuff anything about NUMA > /scheduling/. That's a whole different ballgame. It also has nothing > to do with kernel memory allocation - no ULE, no contigmalloc, no > driver affinity, etc. This is purely for controlling the initial page > allocation for processes - which for a lot of NUMA workloads is all it > needs. > > How to use: > > * look at the NUMA config file. You have to add in memory domain > support or you won't get the domains setup; > * sysctl vm.default_domain controls the default policy. "rr", > "first-touch-rr" and "first-touch" are supported here. > * numactl (--tid=tid or --pid=pid) --policy=policy, --domain=domain, > (--get or --set) (optional command) - like cpuset > > So, some examples: > > numactl --pid=1 --get > > Get the current policy for the given PID: > > # ./numactl --pid=1 --get > Policy: none; domain: -1 > > Run a job with a fixed-domain allocation from domain 1, but pinned to > CPU 0 (which on my system is in domain 0, so it's 100% remote memory > access): > > $ cpuset -l 0 ./numactl --policy=fixed-domain --domain=1 ~/himenobmtxpa xl 0 > > Run a job with round-robin: > > $ cpuset -l 0 ./numactl --policy=rr ~/himenobmtxpa xl 0 > > I'm using the 'pcm-numa.x' tool from the intel-pcm package to ensure > that memory accesses are correctly local/remote/round-robin as > appropriate. > > I'd appreciate feedback and any improvements (yes, including a > manpage) that people have. > > Thanks! > > > > -adrian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-VmokPd=CUAfqmjWPns%2Bpj6zKbpF55tDn2_u8JPNzaK7F1Pw>