From owner-freebsd-arch@FreeBSD.ORG Wed Apr 22 03:04:00 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 74AE39FF for ; Wed, 22 Apr 2015 03:04:00 +0000 (UTC) Received: from mail-ig0-x22c.google.com (mail-ig0-x22c.google.com [IPv6:2607:f8b0:4001:c05::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 48265112E for ; Wed, 22 Apr 2015 03:04:00 +0000 (UTC) Received: by igblo3 with SMTP id lo3so31556766igb.0 for ; Tue, 21 Apr 2015 20:03:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=2h2jNGfX5vu6DkwSV2SZVFnXedfqJwkCoGEaACgAxLI=; b=DJ4AKevkgG8itb+yie+gGnWZEHnYMMJlv1MGf1sh2CpEyOI8569j2zx9m0IE9JCWxY q0jREnPfbX/Jgh/xc2a2N4QTVFE44gxR1EAfxqruE/s4X4Pedj11JHXFb0XWxHvvcMvP nMfnhdekDBjHWStixatfiCdjNb6FW/RD6Vp2tbzfUV2jh0NDkfJETg+KCjgy9c0OJTEz +3ADm3yGndpNXALqQUj3U3+D3FVB77e9v8SYZ8JdZcCesHRCp0M8lfqyCVQtH48SSL6J y/pLo/WqWvieDL3xMhG6B617A/1PS/uunDMQ5AAHK9AiHxHwTmpg+49gEYpivIsSm3OD ZsRw== MIME-Version: 1.0 X-Received: by 10.50.57.36 with SMTP id f4mr1500608igq.6.1429671839476; Tue, 21 Apr 2015 20:03:59 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.36.17.194 with HTTP; Tue, 21 Apr 2015 20:03:59 -0700 (PDT) In-Reply-To: References: Date: Tue, 21 Apr 2015 20:03:59 -0700 X-Google-Sender-Auth: 5gT-IrvEKq0GU6xNRGA067FdxpE Message-ID: Subject: Re: RFT: numa policy branch From: Adrian Chadd To: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Apr 2015 03:04:00 -0000 OH, and the branch: https://github.com/erikarn/freebsd/tree/local/adrian_numa_policy On 21 April 2015 at 19:42, Adrian Chadd wrote: > Hi! > > I have a branch off of -HEAD that implements the bare minimum for > default, per-thread, per-process NUMA allocation policies and > associated syscalls / tool to manipulate it. > > You can all thank Norse for providing me with kit to test this on > (including a Dell R910, which is a quad-socket 40-core, 80-thread > westmere-EX box with ~1TB of RAM) and time to do the work, and Dell > for loaning me way too much hardware to make this happen. > > It's not ready for formal review for commit (hence why this is a > "RFT") but it works well enough in my local test setup that I think > it's worth sharing. > > What it does: > > * adds VM domain policy and iterator types; > * the system default policy is "first-touch-round-robin", which is > "first-touch, and if fail, round-robin to other domains"; > * there's per-proc and per-thread policy entries in struct proc / > struct thread - enough to play with, but certainly not in its final > form; > * two syscalls - numa_setaffinity() and numa_getaffinity(); > * a very basic numactl program, complete with adrian-standard "MAN=". > > This doesn't teach ULE or the proc/thread stuff anything about NUMA > /scheduling/. That's a whole different ballgame. It also has nothing > to do with kernel memory allocation - no ULE, no contigmalloc, no > driver affinity, etc. This is purely for controlling the initial page > allocation for processes - which for a lot of NUMA workloads is all it > needs. > > How to use: > > * look at the NUMA config file. You have to add in memory domain > support or you won't get the domains setup; > * sysctl vm.default_domain controls the default policy. "rr", > "first-touch-rr" and "first-touch" are supported here. > * numactl (--tid=tid or --pid=pid) --policy=policy, --domain=domain, > (--get or --set) (optional command) - like cpuset > > So, some examples: > > numactl --pid=1 --get > > Get the current policy for the given PID: > > # ./numactl --pid=1 --get > Policy: none; domain: -1 > > Run a job with a fixed-domain allocation from domain 1, but pinned to > CPU 0 (which on my system is in domain 0, so it's 100% remote memory > access): > > $ cpuset -l 0 ./numactl --policy=fixed-domain --domain=1 ~/himenobmtxpa xl 0 > > Run a job with round-robin: > > $ cpuset -l 0 ./numactl --policy=rr ~/himenobmtxpa xl 0 > > I'm using the 'pcm-numa.x' tool from the intel-pcm package to ensure > that memory accesses are correctly local/remote/round-robin as > appropriate. > > I'd appreciate feedback and any improvements (yes, including a > manpage) that people have. > > Thanks! > > > > -adrian