From owner-svn-src-all@freebsd.org Sat Mar 24 23:58:45 2018 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A1510F58883; Sat, 24 Mar 2018 23:58:45 +0000 (UTC) (envelope-from jeff@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4ECEF86EA1; Sat, 24 Mar 2018 23:58:45 +0000 (UTC) (envelope-from jeff@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 499AC164D7; Sat, 24 Mar 2018 23:58:45 +0000 (UTC) (envelope-from jeff@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w2ONwjpk051362; Sat, 24 Mar 2018 23:58:45 GMT (envelope-from jeff@FreeBSD.org) Received: (from jeff@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id w2ONwiuu051354; Sat, 24 Mar 2018 23:58:44 GMT (envelope-from jeff@FreeBSD.org) Message-Id: <201803242358.w2ONwiuu051354@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: jeff set sender to jeff@FreeBSD.org using -f From: Jeff Roberson Date: Sat, 24 Mar 2018 23:58:44 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r331508 - in head: lib/libc/sys share/man/man9 usr.bin/cpuset X-SVN-Group: head X-SVN-Commit-Author: jeff X-SVN-Commit-Paths: in head: lib/libc/sys share/man/man9 usr.bin/cpuset X-SVN-Commit-Revision: 331508 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 Mar 2018 23:58:45 -0000 Author: jeff Date: Sat Mar 24 23:58:44 2018 New Revision: 331508 URL: https://svnweb.freebsd.org/changeset/base/331508 Log: Document new NUMA related syscalls and utility options. Sponsored by: Netflix, Dell/EMC Isilon Modified: head/lib/libc/sys/Makefile.inc head/lib/libc/sys/cpuset.2 head/lib/libc/sys/cpuset_getaffinity.2 head/share/man/man9/Makefile head/share/man/man9/malloc.9 head/share/man/man9/zone.9 head/usr.bin/cpuset/cpuset.1 Modified: head/lib/libc/sys/Makefile.inc ============================================================================== --- head/lib/libc/sys/Makefile.inc Sat Mar 24 23:26:54 2018 (r331507) +++ head/lib/libc/sys/Makefile.inc Sat Mar 24 23:58:44 2018 (r331508) @@ -174,6 +174,7 @@ MAN+= abort2.2 \ connectat.2 \ cpuset.2 \ cpuset_getaffinity.2 \ + cpuset_getdomain.2 \ dup.2 \ execve.2 \ _exit.2 \ @@ -371,6 +372,7 @@ MLINKS+=nanosleep.2 clock_nanosleep.2 MLINKS+=cpuset.2 cpuset_getid.2 \ cpuset.2 cpuset_setid.2 MLINKS+=cpuset_getaffinity.2 cpuset_setaffinity.2 +MLINKS+=cpuset_getdomain.2 cpuset_setdomain.2 MLINKS+=dup.2 dup2.2 MLINKS+=execve.2 fexecve.2 MLINKS+=extattr_get_file.2 extattr.2 \ Modified: head/lib/libc/sys/cpuset.2 ============================================================================== --- head/lib/libc/sys/cpuset.2 Sat Mar 24 23:26:54 2018 (r331507) +++ head/lib/libc/sys/cpuset.2 Sat Mar 24 23:58:44 2018 (r331508) @@ -48,21 +48,21 @@ The .Nm family of system calls allow applications to control sets of processors and -assign processes and threads to these sets. -Processor sets contain lists of CPUs that members may run on and exist only -as long as some process is a member of the set. +memory domains and assign processes and threads to these sets. +Processor sets contain lists of CPUs and domains that members may run on +and exist only as long as some process is a member of the set. All processes in the system have an assigned set. The default set for all processes in the system is the set numbered 1. Threads belong to the same set as the process which contains them, however, they may further restrict their set with the anonymous -per-thread mask. +per-thread mask to bind to a specific CPU or subset of CPUs and memory domains. .Pp Sets are referenced by a number of type .Ft cpuset_id_t . Each thread has a root set, an assigned set, and an anonymous mask. Only the root and assigned sets are numbered. -The root set is the set of all CPUs available in the system or in the -system partition the thread is running in. +The root set is the set of all CPUs and memory domains available in the system +or in the system partition the thread is running in. The assigned set is a subset of the root set and is administratively assignable on a per-process basis. Many processes and threads may be members of a numbered set. @@ -72,7 +72,8 @@ set. It is intended that administrators will manipulate numbered sets using .Xr cpuset 1 while application developers will manipulate anonymous sets using -.Xr cpuset_setaffinity 2 . +.Xr cpuset_setaffinity 2 and +.Xr cpuset_setdomain 2 . .Pp To select the correct set a value of type .Ft cpulevel_t @@ -175,9 +176,10 @@ with a process or thread is unsupported since this references the unnumbered anonymous mask. .Pp The actual contents of the sets may be retrieved or manipulated using -.Xr cpuset_getaffinity 2 -and -.Xr cpuset_setaffinity 2 . +.Xr cpuset_getaffinity 2 , +.Xr cpuset_setaffinity 2 , +.Xr cpuset_getdomain 2 , and +.Xr cpuset_setdomain 2 . See those manual pages for more detail. .Sh RETURN VALUES .Rv -std @@ -220,6 +222,8 @@ for allocation. .Xr cpuset 1 , .Xr cpuset_getaffinity 2 , .Xr cpuset_setaffinity 2 , +.Xr cpuset_getdomain 2 , +.Xr cpuset_setdomain 2 , .Xr pthread_affinity_np 3 , .Xr pthread_attr_affinity_np 3 , .Xr cpuset 9 Modified: head/lib/libc/sys/cpuset_getaffinity.2 ============================================================================== --- head/lib/libc/sys/cpuset_getaffinity.2 Sat Mar 24 23:26:54 2018 (r331507) +++ head/lib/libc/sys/cpuset_getaffinity.2 Sat Mar 24 23:58:44 2018 (r331508) @@ -160,6 +160,8 @@ See .Xr cpuset 2 , .Xr cpuset_getid 2 , .Xr cpuset_setid 2 , +.Xr cpuset_getdomain 2 , +.Xr cpuset_setdomain 2 , .Xr pthread_affinity_np 3 , .Xr pthread_attr_affinity_np 3 , .Xr cpuset 9 Modified: head/share/man/man9/Makefile ============================================================================== --- head/share/man/man9/Makefile Sat Mar 24 23:26:54 2018 (r331507) +++ head/share/man/man9/Makefile Sat Mar 24 23:58:44 2018 (r331508) @@ -1271,6 +1271,8 @@ MLINKS+=make_dev.9 destroy_dev.9 \ make_dev.9 make_dev_p.9 \ make_dev.9 make_dev_s.9 MLINKS+=malloc.9 free.9 \ + malloc.9 malloc_domain.9 \ + malloc.9 free_domain.9 \ malloc.9 mallocarray.9 \ malloc.9 MALLOC_DECLARE.9 \ malloc.9 MALLOC_DEFINE.9 \ @@ -2213,10 +2215,12 @@ MLINKS+=vslock.9 vsunlock.9 MLINKS+=zone.9 uma.9 \ zone.9 uma_zalloc.9 \ zone.9 uma_zalloc_arg.9 \ + zone.9 uma_zalloc_domain.9 \ zone.9 uma_zcreate.9 \ zone.9 uma_zdestroy.9 \ zone.9 uma_zfree.9 \ zone.9 uma_zfree_arg.9 \ + zone.9 uma_zfree_domain.9 \ zone.9 uma_zone_get_cur.9 \ zone.9 uma_zone_get_max.9 \ zone.9 uma_zone_set_max.9 \ Modified: head/share/man/man9/malloc.9 ============================================================================== --- head/share/man/man9/malloc.9 Sat Mar 24 23:26:54 2018 (r331507) +++ head/share/man/man9/malloc.9 Sat Mar 24 23:58:44 2018 (r331508) @@ -46,9 +46,13 @@ .Ft void * .Fn malloc "size_t size" "struct malloc_type *type" "int flags" .Ft void * +.Fn malloc_domain "size_t size" "struct malloc_type *type" "int domain" "int flags" +.Ft void * .Fn mallocarray "size_t nmemb" "size_t size" "struct malloc_type *type" "int flags" .Ft void .Fn free "void *addr" "struct malloc_type *type" +.Ft void +.Fn free_domain "void *addr" "struct malloc_type *type" .Ft void * .Fn realloc "void *addr" "size_t size" "struct malloc_type *type" "int flags" .Ft void * @@ -64,6 +68,14 @@ The function allocates uninitialized memory in kernel address space for an object whose size is specified by .Fa size . +.Pp +The +.Fn malloc_domain +variant allocates the object from the specified memory domain. Memory allocated +with this function should be returned with +.Fn free_domain . +See +.Xr numa 9 for more details. .Pp The .Fn mallocarray Modified: head/share/man/man9/zone.9 ============================================================================== --- head/share/man/man9/zone.9 Sat Mar 24 23:26:54 2018 (r331507) +++ head/share/man/man9/zone.9 Sat Mar 24 23:58:44 2018 (r331508) @@ -32,8 +32,10 @@ .Nm uma_zcreate , .Nm uma_zalloc , .Nm uma_zalloc_arg , +.Nm uma_zalloc_domain , .Nm uma_zfree , .Nm uma_zfree_arg , +.Nm uma_zfree_domain , .Nm uma_zdestroy , .Nm uma_zone_set_max , .Nm uma_zone_get_max , @@ -55,11 +57,15 @@ .Fn uma_zalloc "uma_zone_t zone" "int flags" .Ft "void *" .Fn uma_zalloc_arg "uma_zone_t zone" "void *arg" "int flags" +.Ft "void *" +.Fn uma_zalloc_domain "uma_zone_t zone" "void *arg" "int domain" "int flags" .Ft void .Fn uma_zfree "uma_zone_t zone" "void *item" .Ft void .Fn uma_zfree_arg "uma_zone_t zone" "void *item" "void *arg" .Ft void +.Fn uma_zfree_domain "uma_zone_t zone" "void *item" "void *arg" +.Ft void .Fn uma_zdestroy "uma_zone_t zone" .Ft int .Fn uma_zone_set_max "uma_zone_t zone" "int nitems" @@ -78,10 +84,13 @@ .Fn SYSCTL_ADD_UMA_CUR ctx parent nbr name access zone descr .Sh DESCRIPTION The zone allocator provides an efficient interface for managing -dynamically-sized collections of items of similar size. +dynamically-sized collections of items of identical size. The zone allocator can work with preallocated zones as well as with runtime-allocated ones, and is therefore available much earlier in the -boot process than other memory management routines. +boot process than other memory management routines. The zone allocator +provides per-cpu allocation caches with linear scalability on SMP +systems as well as round-robin and first-touch policies for NUMA +systems. .Pp A zone is an extensible collection of items of identical size. The zone allocator keeps track of which items are in use and which @@ -209,6 +218,11 @@ The zone is for the subsystem. .It Dv UMA_ZONE_VM The zone is for the VM subsystem. +.It Dv UMA_ZONE_NUMA +The zone should use a first-touch NUMA policy rather than the round-robin +default. Callers that do not free memory on the same domain it is allocated +from will cause mixing in per-cpu caches. See +.Xr numa 9 for more details. .El .Pp To allocate an item from a zone, simply call @@ -243,12 +257,21 @@ The variations .Fn uma_zalloc_arg and .Fn uma_zfree_arg -allow to +allow callers to specify an argument for the .Dv ctor and .Dv dtor functions, respectively. +The +.Fn uma_zalloc_domain +function allows callers to specify a fixed +.Xr numa 9 domain to allocate from. This uses a guaranteed but slow path in +the allocator which reduces concurrency. The +.Fn uma_zfree_domain +function should be used to return memory allocated in this fashion. This +function infers the domain from the pointer and does not require it as an +argument. .Pp Created zones, which are empty, Modified: head/usr.bin/cpuset/cpuset.1 ============================================================================== --- head/usr.bin/cpuset/cpuset.1 Sat Mar 24 23:26:54 2018 (r331507) +++ head/usr.bin/cpuset/cpuset.1 Sat Mar 24 23:58:44 2018 (r331508) @@ -34,20 +34,24 @@ .Sh SYNOPSIS .Nm .Op Fl l Ar cpu-list +.Op Fl n Ar policy:domain-list .Op Fl s Ar setid .Ar cmd ... .Nm .Op Fl l Ar cpu-list +.Op Fl n Ar policy:domain-list .Op Fl s Ar setid .Fl p Ar pid .Nm .Op Fl c .Op Fl l Ar cpu-list +.Op Fl n Ar policy:domain-list .Fl C .Fl p Ar pid .Nm .Op Fl c .Op Fl l Ar cpu-list +.Op Fl n Ar policy:domain-list .Op Fl j Ar jailid | Fl p Ar pid | Fl t Ar tid | Fl s Ar setid | Fl x Ar irq .Nm .Fl g @@ -57,8 +61,9 @@ The .Nm command can be used to assign processor sets to processes, run commands -constrained to a given set or list of processors, and query information -about processor binding, sets, and available processors in the system. +constrained to a given set or list of processors and memory domains, and query +information about processor binding, memory binding and policy, sets, and +available processors and memory domains in the system. .Pp .Nm requires a target to modify or query. @@ -92,6 +97,15 @@ This last set is the list of all possible CPUs in the queried using .Fl r . .Pp +Most sets include NUMA memory domain and policy information. This can be +inspected with +.Fl g +and set with +.Fl n . +This will specify which NUMA domains are visible to the process and +affect where anonymous memory and file pages will be stored on first access. +Files accessed first by other processes may specify conflicting policy. +.Pp When running a command it may join a set specified with .Fl s otherwise a new set is created. @@ -110,7 +124,8 @@ Create a new cpuset and assign the target process to t The requested operation should reference the cpuset available via the target specifier. .It Fl d Ar domain -Specifies a NUMA domain id as the target of the operation. +Specifies a NUMA domain id as the target of the operation. This can only +be used to query the cpus visible in each numberd domain. .It Fl g Causes .Nm @@ -130,6 +145,13 @@ numbers separated by '-' for ranges and commas separat A special list of .Dq all may be specified in which case the list includes all CPUs from the root set. +.It Fl n Ar domain-list:policy +Specifies a list of domains and allocation policy to apply to a target. Ranges +may be specified as in +.Fl l . +Valid policies include first-touch, ft, round-robin, rr, and prefer. The prefer +policy accepts only a single domain in the set. The parent of the set is +consulted if the preferred domain is unavailable. .It Fl p Ar pid Specifies a pid as the target of the operation. .It Fl s Ar setid