From owner-svn-src-user@freebsd.org  Mon Nov 13 03:24:58 2017
Return-Path: <owner-svn-src-user@freebsd.org>
Delivered-To: svn-src-user@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9EE35CFDC7E
 for <svn-src-user@mailman.ysv.freebsd.org>;
 Mon, 13 Nov 2017 03:24:58 +0000 (UTC)
 (envelope-from jeff@FreeBSD.org)
Received: from repo.freebsd.org (repo.freebsd.org
 [IPv6:2610:1c1:1:6068::e6a:0])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 4D30A7F81D;
 Mon, 13 Nov 2017 03:24:58 +0000 (UTC)
 (envelope-from jeff@FreeBSD.org)
Received: from repo.freebsd.org ([127.0.1.37])
 by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id vAD3OvKE090019;
 Mon, 13 Nov 2017 03:24:57 GMT (envelope-from jeff@FreeBSD.org)
Received: (from jeff@localhost)
 by repo.freebsd.org (8.15.2/8.15.2/Submit) id vAD3Ovhu090018;
 Mon, 13 Nov 2017 03:24:57 GMT (envelope-from jeff@FreeBSD.org)
Message-Id: <201711130324.vAD3Ovhu090018@repo.freebsd.org>
X-Authentication-Warning: repo.freebsd.org: jeff set sender to
 jeff@FreeBSD.org using -f
From: Jeff Roberson <jeff@FreeBSD.org>
Date: Mon, 13 Nov 2017 03:24:57 +0000 (UTC)
To: src-committers@freebsd.org, svn-src-user@freebsd.org
Subject: svn commit: r325751 - user/jeff
X-SVN-Group: user
X-SVN-Commit-Author: jeff
X-SVN-Commit-Paths: user/jeff
X-SVN-Commit-Revision: 325751
X-SVN-Commit-Repository: base
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-BeenThere: svn-src-user@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: "SVN commit messages for the experimental &quot; user&quot;
 src tree" <svn-src-user.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-user/>
List-Post: <mailto:svn-src-user@freebsd.org>
List-Help: <mailto:svn-src-user-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Nov 2017 03:24:58 -0000

Author: jeff
Date: Mon Nov 13 03:24:57 2017
New Revision: 325751
URL: https://svnweb.freebsd.org/changeset/base/325751

Log:
  Make a directory for my projects

Added:
  user/jeff/

From owner-svn-src-user@freebsd.org  Mon Nov 13 03:25:44 2017
Return-Path: <owner-svn-src-user@freebsd.org>
Delivered-To: svn-src-user@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B369ACFDC99
 for <svn-src-user@mailman.ysv.freebsd.org>;
 Mon, 13 Nov 2017 03:25:44 +0000 (UTC)
 (envelope-from jeff@FreeBSD.org)
Received: from repo.freebsd.org (repo.freebsd.org
 [IPv6:2610:1c1:1:6068::e6a:0])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 814937F8FB;
 Mon, 13 Nov 2017 03:25:44 +0000 (UTC)
 (envelope-from jeff@FreeBSD.org)
Received: from repo.freebsd.org ([127.0.1.37])
 by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id vAD3PhXA090099;
 Mon, 13 Nov 2017 03:25:43 GMT (envelope-from jeff@FreeBSD.org)
Received: (from jeff@localhost)
 by repo.freebsd.org (8.15.2/8.15.2/Submit) id vAD3PhtK090098;
 Mon, 13 Nov 2017 03:25:43 GMT (envelope-from jeff@FreeBSD.org)
Message-Id: <201711130325.vAD3PhtK090098@repo.freebsd.org>
X-Authentication-Warning: repo.freebsd.org: jeff set sender to
 jeff@FreeBSD.org using -f
From: Jeff Roberson <jeff@FreeBSD.org>
Date: Mon, 13 Nov 2017 03:25:43 +0000 (UTC)
To: src-committers@freebsd.org, svn-src-user@freebsd.org
Subject: svn commit: r325752 - user/jeff/numa
X-SVN-Group: user
X-SVN-Commit-Author: jeff
X-SVN-Commit-Paths: user/jeff/numa
X-SVN-Commit-Revision: 325752
X-SVN-Commit-Repository: base
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-BeenThere: svn-src-user@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: "SVN commit messages for the experimental &quot; user&quot;
 src tree" <svn-src-user.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-user/>
List-Post: <mailto:svn-src-user@freebsd.org>
List-Help: <mailto:svn-src-user-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Nov 2017 03:25:44 -0000

Author: jeff
Date: Mon Nov 13 03:25:43 2017
New Revision: 325752
URL: https://svnweb.freebsd.org/changeset/base/325752

Log:
  Make a staging branch for numa patches

Added:
     - copied from r325751, head/
Directory Properties:
  user/jeff/numa/   (props changed)

From owner-svn-src-user@freebsd.org  Mon Nov 13 03:34:57 2017
Return-Path: <owner-svn-src-user@freebsd.org>
Delivered-To: svn-src-user@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1D320CFDF1D
 for <svn-src-user@mailman.ysv.freebsd.org>;
 Mon, 13 Nov 2017 03:34:57 +0000 (UTC)
 (envelope-from jeff@FreeBSD.org)
Received: from repo.freebsd.org (repo.freebsd.org
 [IPv6:2610:1c1:1:6068::e6a:0])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id B99DB7FCE9;
 Mon, 13 Nov 2017 03:34:56 +0000 (UTC)
 (envelope-from jeff@FreeBSD.org)
Received: from repo.freebsd.org ([127.0.1.37])
 by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id vAD3YtXe094160;
 Mon, 13 Nov 2017 03:34:55 GMT (envelope-from jeff@FreeBSD.org)
Received: (from jeff@localhost)
 by repo.freebsd.org (8.15.2/8.15.2/Submit) id vAD3YtnW094151;
 Mon, 13 Nov 2017 03:34:55 GMT (envelope-from jeff@FreeBSD.org)
Message-Id: <201711130334.vAD3YtnW094151@repo.freebsd.org>
X-Authentication-Warning: repo.freebsd.org: jeff set sender to
 jeff@FreeBSD.org using -f
From: Jeff Roberson <jeff@FreeBSD.org>
Date: Mon, 13 Nov 2017 03:34:55 +0000 (UTC)
To: src-committers@freebsd.org, svn-src-user@freebsd.org
Subject: svn commit: r325753 - user/jeff/numa/sys/vm
X-SVN-Group: user
X-SVN-Commit-Author: jeff
X-SVN-Commit-Paths: user/jeff/numa/sys/vm
X-SVN-Commit-Revision: 325753
X-SVN-Commit-Repository: base
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-BeenThere: svn-src-user@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: "SVN commit messages for the experimental &quot; user&quot;
 src tree" <svn-src-user.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-user/>
List-Post: <mailto:svn-src-user@freebsd.org>
List-Help: <mailto:svn-src-user-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Nov 2017 03:34:57 -0000

Author: jeff
Date: Mon Nov 13 03:34:55 2017
New Revision: 325753
URL: https://svnweb.freebsd.org/changeset/base/325753

Log:
  Move NUMA policy iterators into the page allocator layer
  
  https://reviews.freebsd.org/D13014

Modified:
  user/jeff/numa/sys/vm/vm_domain.c
  user/jeff/numa/sys/vm/vm_domain.h
  user/jeff/numa/sys/vm/vm_page.c
  user/jeff/numa/sys/vm/vm_page.h
  user/jeff/numa/sys/vm/vm_phys.c
  user/jeff/numa/sys/vm/vm_phys.h
  user/jeff/numa/sys/vm/vm_reserv.c
  user/jeff/numa/sys/vm/vm_reserv.h

Modified: user/jeff/numa/sys/vm/vm_domain.c
==============================================================================
--- user/jeff/numa/sys/vm/vm_domain.c	Mon Nov 13 03:25:43 2017	(r325752)
+++ user/jeff/numa/sys/vm/vm_domain.c	Mon Nov 13 03:34:55 2017	(r325753)
@@ -61,6 +61,118 @@ __FBSDID("$FreeBSD$");
 
 #include <vm/vm_domain.h>
 
+/*
+ * Default to first-touch + round-robin.
+ */
+static struct mtx vm_default_policy_mtx;
+MTX_SYSINIT(vm_default_policy, &vm_default_policy_mtx, "default policy mutex",
+    MTX_DEF);
+#ifdef VM_NUMA_ALLOC
+static struct vm_domain_policy vm_default_policy =
+    VM_DOMAIN_POLICY_STATIC_INITIALISER(VM_POLICY_FIRST_TOUCH_ROUND_ROBIN, 0);
+#else
+/* Use round-robin so the domain policy code will only try once per allocation */
+static struct vm_domain_policy vm_default_policy =
+    VM_DOMAIN_POLICY_STATIC_INITIALISER(VM_POLICY_ROUND_ROBIN, 0);
+#endif
+
+static int
+sysctl_vm_default_policy(SYSCTL_HANDLER_ARGS)
+{
+	char policy_name[32];
+	int error;
+
+	mtx_lock(&vm_default_policy_mtx);
+
+	/* Map policy to output string */
+	switch (vm_default_policy.p.policy) {
+	case VM_POLICY_FIRST_TOUCH:
+		strcpy(policy_name, "first-touch");
+		break;
+	case VM_POLICY_FIRST_TOUCH_ROUND_ROBIN:
+		strcpy(policy_name, "first-touch-rr");
+		break;
+	case VM_POLICY_ROUND_ROBIN:
+	default:
+		strcpy(policy_name, "rr");
+		break;
+	}
+	mtx_unlock(&vm_default_policy_mtx);
+
+	error = sysctl_handle_string(oidp, &policy_name[0],
+	    sizeof(policy_name), req);
+	if (error != 0 || req->newptr == NULL)
+		return (error);
+
+	mtx_lock(&vm_default_policy_mtx);
+	/* Set: match on the subset of policies that make sense as a default */
+	if (strcmp("first-touch-rr", policy_name) == 0) {
+		vm_domain_policy_set(&vm_default_policy,
+		    VM_POLICY_FIRST_TOUCH_ROUND_ROBIN, 0);
+	} else if (strcmp("first-touch", policy_name) == 0) {
+		vm_domain_policy_set(&vm_default_policy,
+		    VM_POLICY_FIRST_TOUCH, 0);
+	} else if (strcmp("rr", policy_name) == 0) {
+		vm_domain_policy_set(&vm_default_policy,
+		    VM_POLICY_ROUND_ROBIN, 0);
+	} else {
+		error = EINVAL;
+		goto finish;
+	}
+
+	error = 0;
+finish:
+	mtx_unlock(&vm_default_policy_mtx);
+	return (error);
+}
+
+SYSCTL_PROC(_vm, OID_AUTO, default_policy, CTLTYPE_STRING | CTLFLAG_RW,
+    0, 0, sysctl_vm_default_policy, "A",
+    "Default policy (rr, first-touch, first-touch-rr");
+
+/*
+ * Initialise a VM domain iterator.
+ *
+ * Check the thread policy, then the proc policy,
+ * then default to the system policy.
+ */
+void
+vm_policy_iterator_init(struct vm_domain_iterator *vi)
+{
+#ifdef VM_NUMA_ALLOC
+	struct vm_domain_policy lcl;
+#endif
+
+	vm_domain_iterator_init(vi);
+
+#ifdef VM_NUMA_ALLOC
+	/* Copy out the thread policy */
+	vm_domain_policy_localcopy(&lcl, &curthread->td_vm_dom_policy);
+	if (lcl.p.policy != VM_POLICY_NONE) {
+		/* Thread policy is present; use it */
+		vm_domain_iterator_set_policy(vi, &lcl);
+		return;
+	}
+
+	vm_domain_policy_localcopy(&lcl,
+	    &curthread->td_proc->p_vm_dom_policy);
+	if (lcl.p.policy != VM_POLICY_NONE) {
+		/* Process policy is present; use it */
+		vm_domain_iterator_set_policy(vi, &lcl);
+		return;
+	}
+#endif
+	/* Use system default policy */
+	vm_domain_iterator_set_policy(vi, &vm_default_policy);
+}
+
+void
+vm_policy_iterator_finish(struct vm_domain_iterator *vi)
+{
+
+	vm_domain_iterator_cleanup(vi);
+}
+
 #ifdef VM_NUMA_ALLOC
 static __inline int
 vm_domain_rr_selectdomain(int skip_domain)

Modified: user/jeff/numa/sys/vm/vm_domain.h
==============================================================================
--- user/jeff/numa/sys/vm/vm_domain.h	Mon Nov 13 03:25:43 2017	(r325752)
+++ user/jeff/numa/sys/vm/vm_domain.h	Mon Nov 13 03:34:55 2017	(r325753)
@@ -63,4 +63,7 @@ extern	int vm_domain_iterator_run(struct vm_domain_ite
 extern	int vm_domain_iterator_isdone(struct vm_domain_iterator *vi);
 extern	int vm_domain_iterator_cleanup(struct vm_domain_iterator *vi);
 
+extern	void vm_policy_iterator_init(struct vm_domain_iterator *vi);
+extern	void vm_policy_iterator_finish(struct vm_domain_iterator *vi);
+
 #endif	/* __VM_DOMAIN_H__ */

Modified: user/jeff/numa/sys/vm/vm_page.c
==============================================================================
--- user/jeff/numa/sys/vm/vm_page.c	Mon Nov 13 03:25:43 2017	(r325752)
+++ user/jeff/numa/sys/vm/vm_page.c	Mon Nov 13 03:34:55 2017	(r325753)
@@ -107,6 +107,7 @@ __FBSDID("$FreeBSD$");
 #include <vm/vm.h>
 #include <vm/pmap.h>
 #include <vm/vm_param.h>
+#include <vm/vm_domain.h>
 #include <vm/vm_kern.h>
 #include <vm/vm_object.h>
 #include <vm/vm_page.h>
@@ -1577,6 +1578,16 @@ vm_page_alloc(vm_object_t object, vm_pindex_t pindex, 
 	    vm_radix_lookup_le(&object->rtree, pindex) : NULL));
 }
 
+vm_page_t
+vm_page_alloc_domain(vm_object_t object, vm_pindex_t pindex, int domain,
+    int req)
+{
+
+	return (vm_page_alloc_domain_after(object, pindex, domain, req,
+	    object != NULL ? vm_radix_lookup_le(&object->rtree, pindex) :
+	    NULL));
+}
+
 /*
  * Allocate a page in the specified object with the given page index.  To
  * optimize insertion of the page into the object, the caller must also specifiy
@@ -1584,10 +1595,35 @@ vm_page_alloc(vm_object_t object, vm_pindex_t pindex, 
  * page index, or NULL if no such page exists.
  */
 vm_page_t
-vm_page_alloc_after(vm_object_t object, vm_pindex_t pindex, int req,
-    vm_page_t mpred)
+vm_page_alloc_after(vm_object_t object, vm_pindex_t pindex,
+    int req, vm_page_t mpred)
 {
+	struct vm_domain_iterator vi;
 	vm_page_t m;
+	int domain, wait;
+
+	m = NULL;
+	vm_policy_iterator_init(&vi);
+	wait = req & (VM_ALLOC_WAITFAIL | VM_ALLOC_WAITOK);
+	req &= ~wait;
+	while ((vm_domain_iterator_run(&vi, &domain)) == 0) {
+		if (vm_domain_iterator_isdone(&vi))
+			req |= wait;
+		m = vm_page_alloc_domain_after(object, pindex, domain, req,
+		    mpred);
+		if (m != NULL)
+			break;
+	}
+	vm_policy_iterator_finish(&vi);
+
+	return (m);
+}
+
+vm_page_t
+vm_page_alloc_domain_after(vm_object_t object, vm_pindex_t pindex, int domain,
+    int req, vm_page_t mpred)
+{
+	vm_page_t m;
 	int flags, req_class;
 	u_int free_count;
 
@@ -1617,6 +1653,7 @@ vm_page_alloc_after(vm_object_t object, vm_pindex_t pi
 	 * for the request class.
 	 */
 again:
+	m = NULL;
 	mtx_lock(&vm_page_queue_free_mtx);
 	if (vm_cnt.v_free_count > vm_cnt.v_free_reserved ||
 	    (req_class == VM_ALLOC_SYSTEM &&
@@ -1629,23 +1666,26 @@ again:
 #if VM_NRESERVLEVEL > 0
 		if (object == NULL || (object->flags & (OBJ_COLORED |
 		    OBJ_FICTITIOUS)) != OBJ_COLORED || (m =
-		    vm_reserv_alloc_page(object, pindex, mpred)) == NULL)
+		    vm_reserv_alloc_page(object, pindex, domain,
+		    mpred)) == NULL)
 #endif
 		{
 			/*
 			 * If not, allocate it from the free page queues.
 			 */
-			m = vm_phys_alloc_pages(object != NULL ?
+			m = vm_phys_alloc_pages(domain, object != NULL ?
 			    VM_FREEPOOL_DEFAULT : VM_FREEPOOL_DIRECT, 0);
 #if VM_NRESERVLEVEL > 0
-			if (m == NULL && vm_reserv_reclaim_inactive()) {
-				m = vm_phys_alloc_pages(object != NULL ?
+			if (m == NULL && vm_reserv_reclaim_inactive(domain)) {
+				m = vm_phys_alloc_pages(domain,
+				    object != NULL ?
 				    VM_FREEPOOL_DEFAULT : VM_FREEPOOL_DIRECT,
 				    0);
 			}
 #endif
 		}
-	} else {
+	}
+	if (m == NULL) {
 		/*
 		 * Not allocatable, give up.
 		 */
@@ -1773,6 +1813,32 @@ vm_page_alloc_contig(vm_object_t object, vm_pindex_t p
     u_long npages, vm_paddr_t low, vm_paddr_t high, u_long alignment,
     vm_paddr_t boundary, vm_memattr_t memattr)
 {
+	struct vm_domain_iterator vi;
+	vm_page_t m;
+	int domain, wait;
+
+	m = NULL;
+	vm_policy_iterator_init(&vi);
+	wait = req & (VM_ALLOC_WAITFAIL | VM_ALLOC_WAITOK);
+	req &= ~wait;
+	while ((vm_domain_iterator_run(&vi, &domain)) == 0) {
+		if (vm_domain_iterator_isdone(&vi))
+			req |= wait;
+		m = vm_page_alloc_contig_domain(object, pindex, domain, req,
+		    npages, low, high, alignment, boundary, memattr);
+		if (m != NULL)
+			break;
+	}
+	vm_policy_iterator_finish(&vi);
+
+	return (m);
+}
+
+vm_page_t
+vm_page_alloc_contig_domain(vm_object_t object, vm_pindex_t pindex, int domain,
+    int req, u_long npages, vm_paddr_t low, vm_paddr_t high, u_long alignment,
+    vm_paddr_t boundary, vm_memattr_t memattr)
+{
 	vm_page_t m, m_ret, mpred;
 	u_int busy_lock, flags, oflags;
 	int req_class;
@@ -1812,6 +1878,7 @@ vm_page_alloc_contig(vm_object_t object, vm_pindex_t p
 	 * below the lower bound for the allocation class?
 	 */
 again:
+	m_ret = NULL;
 	mtx_lock(&vm_page_queue_free_mtx);
 	if (vm_cnt.v_free_count >= npages + vm_cnt.v_free_reserved ||
 	    (req_class == VM_ALLOC_SYSTEM &&
@@ -1824,31 +1891,27 @@ again:
 #if VM_NRESERVLEVEL > 0
 retry:
 		if (object == NULL || (object->flags & OBJ_COLORED) == 0 ||
-		    (m_ret = vm_reserv_alloc_contig(object, pindex, npages,
-		    low, high, alignment, boundary, mpred)) == NULL)
+		    (m_ret = vm_reserv_alloc_contig(object, pindex, domain,
+		    npages, low, high, alignment, boundary, mpred)) == NULL)
 #endif
 			/*
 			 * If not, allocate them from the free page queues.
 			 */
-			m_ret = vm_phys_alloc_contig(npages, low, high,
+			m_ret = vm_phys_alloc_contig(domain, npages, low, high,
 			    alignment, boundary);
-	} else {
-		if (vm_page_alloc_fail(object, req))
-			goto again;
-		return (NULL);
-	}
-	if (m_ret != NULL)
-		vm_phys_freecnt_adj(m_ret, -npages);
-	else {
 #if VM_NRESERVLEVEL > 0
-		if (vm_reserv_reclaim_contig(npages, low, high, alignment,
-		    boundary))
+		if (m_ret == NULL && vm_reserv_reclaim_contig(
+		    domain, npages, low, high, alignment, boundary))
 			goto retry;
 #endif
 	}
-	mtx_unlock(&vm_page_queue_free_mtx);
-	if (m_ret == NULL)
+	if (m_ret == NULL) {
+		if (vm_page_alloc_fail(object, req))
+			goto again;
 		return (NULL);
+	}
+	vm_phys_freecnt_adj(m_ret, -npages);
+	mtx_unlock(&vm_page_queue_free_mtx);
 	for (m = m_ret; m < &m_ret[npages]; m++)
 		vm_page_alloc_check(m);
 
@@ -1962,7 +2025,30 @@ vm_page_alloc_check(vm_page_t m)
 vm_page_t
 vm_page_alloc_freelist(int flind, int req)
 {
+	struct vm_domain_iterator vi;
 	vm_page_t m;
+	int domain, wait;
+
+	m = NULL;
+	vm_policy_iterator_init(&vi);
+	wait = req & (VM_ALLOC_WAITFAIL | VM_ALLOC_WAITOK);
+	req &= ~wait;
+	while ((vm_domain_iterator_run(&vi, &domain)) == 0) {
+		if (vm_domain_iterator_isdone(&vi))
+			req |= wait;
+		m = vm_page_alloc_freelist_domain(domain, flind, req);
+		if (m != NULL)
+			break;
+	}
+	vm_policy_iterator_finish(&vi);
+
+	return (m);
+}
+
+vm_page_t
+vm_page_alloc_freelist_domain(int domain, int flind, int req)
+{
+	vm_page_t m;
 	u_int flags, free_count;
 	int req_class;
 
@@ -1983,15 +2069,12 @@ again:
 	    (req_class == VM_ALLOC_SYSTEM &&
 	    vm_cnt.v_free_count > vm_cnt.v_interrupt_free_min) ||
 	    (req_class == VM_ALLOC_INTERRUPT &&
-	    vm_cnt.v_free_count > 0)) {
-		m = vm_phys_alloc_freelist_pages(flind, VM_FREEPOOL_DIRECT, 0);
-	} else {
+	    vm_cnt.v_free_count > 0))
+		m = vm_phys_alloc_freelist_pages(domain, flind,
+		    VM_FREEPOOL_DIRECT, 0);
+	if (m == NULL) {
 		if (vm_page_alloc_fail(NULL, req))
 			goto again;
-		return (NULL);
-	}
-	if (m == NULL) {
-		mtx_unlock(&vm_page_queue_free_mtx);
 		return (NULL);
 	}
 	free_count = vm_phys_freecnt_adj(m, -1);

Modified: user/jeff/numa/sys/vm/vm_page.h
==============================================================================
--- user/jeff/numa/sys/vm/vm_page.h	Mon Nov 13 03:25:43 2017	(r325752)
+++ user/jeff/numa/sys/vm/vm_page.h	Mon Nov 13 03:34:55 2017	(r325753)
@@ -474,16 +474,24 @@ void vm_page_free_zero(vm_page_t m);
 void vm_page_activate (vm_page_t);
 void vm_page_advise(vm_page_t m, int advice);
 vm_page_t vm_page_alloc(vm_object_t, vm_pindex_t, int);
+vm_page_t vm_page_alloc_domain(vm_object_t, vm_pindex_t, int, int);
 vm_page_t vm_page_alloc_after(vm_object_t, vm_pindex_t, int, vm_page_t);
+vm_page_t vm_page_alloc_domain_after(vm_object_t, vm_pindex_t, int, int,
+    vm_page_t);
 vm_page_t vm_page_alloc_contig(vm_object_t object, vm_pindex_t pindex, int req,
     u_long npages, vm_paddr_t low, vm_paddr_t high, u_long alignment,
     vm_paddr_t boundary, vm_memattr_t memattr);
+vm_page_t vm_page_alloc_contig_domain(vm_object_t object,
+    vm_pindex_t pindex, int domain, int req, u_long npages, vm_paddr_t low,
+    vm_paddr_t high, u_long alignment, vm_paddr_t boundary,
+    vm_memattr_t memattr);
 vm_page_t vm_page_alloc_freelist(int, int);
+vm_page_t vm_page_alloc_freelist_domain(int, int, int);
 void vm_page_change_lock(vm_page_t m, struct mtx **mtx);
 vm_page_t vm_page_grab (vm_object_t, vm_pindex_t, int);
 int vm_page_grab_pages(vm_object_t object, vm_pindex_t pindex, int allocflags,
     vm_page_t *ma, int count);
-void vm_page_deactivate (vm_page_t);
+void vm_page_deactivate(vm_page_t);
 void vm_page_deactivate_noreuse(vm_page_t);
 void vm_page_dequeue(vm_page_t m);
 void vm_page_dequeue_locked(vm_page_t m);
@@ -504,6 +512,8 @@ void vm_page_putfake(vm_page_t m);
 void vm_page_readahead_finish(vm_page_t m);
 bool vm_page_reclaim_contig(int req, u_long npages, vm_paddr_t low,
     vm_paddr_t high, u_long alignment, vm_paddr_t boundary);
+bool vm_page_reclaim_contig_domain(int req, u_long npages, int domain,
+    vm_paddr_t low, vm_paddr_t high, u_long alignment, vm_paddr_t boundary);
 void vm_page_reference(vm_page_t m);
 void vm_page_remove (vm_page_t);
 int vm_page_rename (vm_page_t, vm_object_t, vm_pindex_t);

Modified: user/jeff/numa/sys/vm/vm_phys.c
==============================================================================
--- user/jeff/numa/sys/vm/vm_phys.c	Mon Nov 13 03:25:43 2017	(r325752)
+++ user/jeff/numa/sys/vm/vm_phys.c	Mon Nov 13 03:34:55 2017	(r325753)
@@ -149,23 +149,6 @@ SYSCTL_OID(_vm, OID_AUTO, phys_locality, CTLTYPE_STRIN
 SYSCTL_INT(_vm, OID_AUTO, ndomains, CTLFLAG_RD,
     &vm_ndomains, 0, "Number of physical memory domains available.");
 
-/*
- * Default to first-touch + round-robin.
- */
-static struct mtx vm_default_policy_mtx;
-MTX_SYSINIT(vm_default_policy, &vm_default_policy_mtx, "default policy mutex",
-    MTX_DEF);
-#ifdef VM_NUMA_ALLOC
-static struct vm_domain_policy vm_default_policy =
-    VM_DOMAIN_POLICY_STATIC_INITIALISER(VM_POLICY_FIRST_TOUCH_ROUND_ROBIN, 0);
-#else
-/* Use round-robin so the domain policy code will only try once per allocation */
-static struct vm_domain_policy vm_default_policy =
-    VM_DOMAIN_POLICY_STATIC_INITIALISER(VM_POLICY_ROUND_ROBIN, 0);
-#endif
-
-static vm_page_t vm_phys_alloc_domain_pages(int domain, int flind, int pool,
-    int order);
 static vm_page_t vm_phys_alloc_seg_contig(struct vm_phys_seg *seg,
     u_long npages, vm_paddr_t low, vm_paddr_t high, u_long alignment,
     vm_paddr_t boundary);
@@ -175,60 +158,6 @@ static int vm_phys_paddr_to_segind(vm_paddr_t pa);
 static void vm_phys_split_pages(vm_page_t m, int oind, struct vm_freelist *fl,
     int order);
 
-static int
-sysctl_vm_default_policy(SYSCTL_HANDLER_ARGS)
-{
-	char policy_name[32];
-	int error;
-
-	mtx_lock(&vm_default_policy_mtx);
-
-	/* Map policy to output string */
-	switch (vm_default_policy.p.policy) {
-	case VM_POLICY_FIRST_TOUCH:
-		strcpy(policy_name, "first-touch");
-		break;
-	case VM_POLICY_FIRST_TOUCH_ROUND_ROBIN:
-		strcpy(policy_name, "first-touch-rr");
-		break;
-	case VM_POLICY_ROUND_ROBIN:
-	default:
-		strcpy(policy_name, "rr");
-		break;
-	}
-	mtx_unlock(&vm_default_policy_mtx);
-
-	error = sysctl_handle_string(oidp, &policy_name[0],
-	    sizeof(policy_name), req);
-	if (error != 0 || req->newptr == NULL)
-		return (error);
-
-	mtx_lock(&vm_default_policy_mtx);
-	/* Set: match on the subset of policies that make sense as a default */
-	if (strcmp("first-touch-rr", policy_name) == 0) {
-		vm_domain_policy_set(&vm_default_policy,
-		    VM_POLICY_FIRST_TOUCH_ROUND_ROBIN, 0);
-	} else if (strcmp("first-touch", policy_name) == 0) {
-		vm_domain_policy_set(&vm_default_policy,
-		    VM_POLICY_FIRST_TOUCH, 0);
-	} else if (strcmp("rr", policy_name) == 0) {
-		vm_domain_policy_set(&vm_default_policy,
-		    VM_POLICY_ROUND_ROBIN, 0);
-	} else {
-		error = EINVAL;
-		goto finish;
-	}
-
-	error = 0;
-finish:
-	mtx_unlock(&vm_default_policy_mtx);
-	return (error);
-}
-
-SYSCTL_PROC(_vm, OID_AUTO, default_policy, CTLTYPE_STRING | CTLFLAG_RW,
-    0, 0, sysctl_vm_default_policy, "A",
-    "Default policy (rr, first-touch, first-touch-rr");
-
 /*
  * Red-black tree helpers for vm fictitious range management.
  */
@@ -270,71 +199,6 @@ vm_phys_fictitious_cmp(struct vm_phys_fictitious_seg *
 	    (uintmax_t)p1->end, (uintmax_t)p2->start, (uintmax_t)p2->end);
 }
 
-#ifdef notyet
-static __inline int
-vm_rr_selectdomain(void)
-{
-#ifdef VM_NUMA_ALLOC
-	struct thread *td;
-
-	td = curthread;
-
-	td->td_dom_rr_idx++;
-	td->td_dom_rr_idx %= vm_ndomains;
-	return (td->td_dom_rr_idx);
-#else
-	return (0);
-#endif
-}
-#endif /* notyet */
-
-/*
- * Initialise a VM domain iterator.
- *
- * Check the thread policy, then the proc policy,
- * then default to the system policy.
- *
- * Later on the various layers will have this logic
- * plumbed into them and the phys code will be explicitly
- * handed a VM domain policy to use.
- */
-static void
-vm_policy_iterator_init(struct vm_domain_iterator *vi)
-{
-#ifdef VM_NUMA_ALLOC
-	struct vm_domain_policy lcl;
-#endif
-
-	vm_domain_iterator_init(vi);
-
-#ifdef VM_NUMA_ALLOC
-	/* Copy out the thread policy */
-	vm_domain_policy_localcopy(&lcl, &curthread->td_vm_dom_policy);
-	if (lcl.p.policy != VM_POLICY_NONE) {
-		/* Thread policy is present; use it */
-		vm_domain_iterator_set_policy(vi, &lcl);
-		return;
-	}
-
-	vm_domain_policy_localcopy(&lcl,
-	    &curthread->td_proc->p_vm_dom_policy);
-	if (lcl.p.policy != VM_POLICY_NONE) {
-		/* Process policy is present; use it */
-		vm_domain_iterator_set_policy(vi, &lcl);
-		return;
-	}
-#endif
-	/* Use system default policy */
-	vm_domain_iterator_set_policy(vi, &vm_default_policy);
-}
-
-static void
-vm_policy_iterator_finish(struct vm_domain_iterator *vi)
-{
-
-	vm_domain_iterator_cleanup(vi);
-}
-
 boolean_t
 vm_phys_domain_intersects(long mask, vm_paddr_t low, vm_paddr_t high)
 {
@@ -503,7 +367,7 @@ _vm_phys_create_seg(vm_paddr_t start, vm_paddr_t end, 
 
 	KASSERT(vm_phys_nsegs < VM_PHYSSEG_MAX,
 	    ("vm_phys_create_seg: increase VM_PHYSSEG_MAX"));
-	KASSERT(domain < vm_ndomains,
+	KASSERT(domain >= 0 && domain < vm_ndomains,
 	    ("vm_phys_create_seg: invalid domain provided"));
 	seg = &vm_phys_segs[vm_phys_nsegs++];
 	while (seg > vm_phys_segs && (seg - 1)->start >= end) {
@@ -760,29 +624,16 @@ vm_phys_init_page(vm_paddr_t pa)
  * The free page queues must be locked.
  */
 vm_page_t
-vm_phys_alloc_pages(int pool, int order)
+vm_phys_alloc_pages(int domain, int pool, int order)
 {
 	vm_page_t m;
-	int domain, flind;
-	struct vm_domain_iterator vi;
+	int flind;
 
-	KASSERT(pool < VM_NFREEPOOL,
-	    ("vm_phys_alloc_pages: pool %d is out of range", pool));
-	KASSERT(order < VM_NFREEORDER,
-	    ("vm_phys_alloc_pages: order %d is out of range", order));
-
-	vm_policy_iterator_init(&vi);
-
-	while ((vm_domain_iterator_run(&vi, &domain)) == 0) {
-		for (flind = 0; flind < vm_nfreelists; flind++) {
-			m = vm_phys_alloc_domain_pages(domain, flind, pool,
-			    order);
-			if (m != NULL)
-				return (m);
-		}
+	for (flind = 0; flind < vm_nfreelists; flind++) {
+		m = vm_phys_alloc_freelist_pages(domain, flind, pool, order);
+		if (m != NULL)
+			return (m);
 	}
-
-	vm_policy_iterator_finish(&vi);
 	return (NULL);
 }
 
@@ -794,41 +645,23 @@ vm_phys_alloc_pages(int pool, int order)
  * The free page queues must be locked.
  */
 vm_page_t
-vm_phys_alloc_freelist_pages(int freelist, int pool, int order)
+vm_phys_alloc_freelist_pages(int domain, int flind, int pool, int order)
 {
+	struct vm_freelist *alt, *fl;
 	vm_page_t m;
-	struct vm_domain_iterator vi;
-	int domain;
+	int oind, pind;
 
-	KASSERT(freelist < VM_NFREELIST,
+	KASSERT(domain >= 0 && domain < vm_ndomains,
+	    ("vm_phys_alloc_freelist_pages: domain %d is out of range",
+	    domain));
+	KASSERT(flind < VM_NFREELIST,
 	    ("vm_phys_alloc_freelist_pages: freelist %d is out of range",
-	    freelist));
+	    flind));
 	KASSERT(pool < VM_NFREEPOOL,
 	    ("vm_phys_alloc_freelist_pages: pool %d is out of range", pool));
 	KASSERT(order < VM_NFREEORDER,
 	    ("vm_phys_alloc_freelist_pages: order %d is out of range", order));
 
-	vm_policy_iterator_init(&vi);
-
-	while ((vm_domain_iterator_run(&vi, &domain)) == 0) {
-		m = vm_phys_alloc_domain_pages(domain,
-		    vm_freelist_to_flind[freelist], pool, order);
-		if (m != NULL)
-			return (m);
-	}
-
-	vm_policy_iterator_finish(&vi);
-	return (NULL);
-}
-
-static vm_page_t
-vm_phys_alloc_domain_pages(int domain, int flind, int pool, int order)
-{	
-	struct vm_freelist *fl;
-	struct vm_freelist *alt;
-	int oind, pind;
-	vm_page_t m;
-
 	mtx_assert(&vm_page_queue_free_mtx, MA_OWNED);
 	fl = &vm_phys_free_queues[domain][flind][pool][0];
 	for (oind = order; oind < VM_NFREEORDER; oind++) {
@@ -1303,14 +1136,13 @@ vm_phys_unfree_page(vm_page_t m)
  * "alignment" and "boundary" must be a power of two.
  */
 vm_page_t
-vm_phys_alloc_contig(u_long npages, vm_paddr_t low, vm_paddr_t high,
+vm_phys_alloc_contig(int domain, u_long npages, vm_paddr_t low, vm_paddr_t high,
     u_long alignment, vm_paddr_t boundary)
 {
 	vm_paddr_t pa_end, pa_start;
 	vm_page_t m_run;
-	struct vm_domain_iterator vi;
 	struct vm_phys_seg *seg;
-	int domain, segind;
+	int segind;
 
 	KASSERT(npages > 0, ("npages is 0"));
 	KASSERT(powerof2(alignment), ("alignment is not a power of 2"));
@@ -1318,12 +1150,6 @@ vm_phys_alloc_contig(u_long npages, vm_paddr_t low, vm
 	mtx_assert(&vm_page_queue_free_mtx, MA_OWNED);
 	if (low >= high)
 		return (NULL);
-	vm_policy_iterator_init(&vi);
-restartdom:
-	if (vm_domain_iterator_run(&vi, &domain) != 0) {
-		vm_policy_iterator_finish(&vi);
-		return (NULL);
-	}
 	m_run = NULL;
 	for (segind = vm_phys_nsegs - 1; segind >= 0; segind--) {
 		seg = &vm_phys_segs[segind];
@@ -1346,9 +1172,6 @@ restartdom:
 		if (m_run != NULL)
 			break;
 	}
-	if (m_run == NULL && !vm_domain_iterator_isdone(&vi))
-		goto restartdom;
-	vm_policy_iterator_finish(&vi);
 	return (m_run);
 }
 

Modified: user/jeff/numa/sys/vm/vm_phys.h
==============================================================================
--- user/jeff/numa/sys/vm/vm_phys.h	Mon Nov 13 03:25:43 2017	(r325752)
+++ user/jeff/numa/sys/vm/vm_phys.h	Mon Nov 13 03:34:55 2017	(r325753)
@@ -70,10 +70,11 @@ extern int vm_phys_nsegs;
  * The following functions are only to be used by the virtual memory system.
  */
 void vm_phys_add_seg(vm_paddr_t start, vm_paddr_t end);
-vm_page_t vm_phys_alloc_contig(u_long npages, vm_paddr_t low, vm_paddr_t high,
-    u_long alignment, vm_paddr_t boundary);
-vm_page_t vm_phys_alloc_freelist_pages(int freelist, int pool, int order);
-vm_page_t vm_phys_alloc_pages(int pool, int order);
+vm_page_t vm_phys_alloc_contig(int domain, u_long npages, vm_paddr_t low,
+    vm_paddr_t high, u_long alignment, vm_paddr_t boundary);
+vm_page_t vm_phys_alloc_freelist_pages(int domain, int freelist, int pool,
+    int order);
+vm_page_t vm_phys_alloc_pages(int domain, int pool, int order);
 boolean_t vm_phys_domain_intersects(long mask, vm_paddr_t low, vm_paddr_t high);
 int vm_phys_fictitious_reg_range(vm_paddr_t start, vm_paddr_t end,
     vm_memattr_t memattr);
@@ -91,12 +92,13 @@ boolean_t vm_phys_unfree_page(vm_page_t m);
 int vm_phys_mem_affinity(int f, int t);
 
 /*
- *	vm_phys_domain:
  *
- * 	Return the memory domain the page belongs to.
+ *	vm_phys_domidx:
+ *
+ *	Return the index of the domain the page belongs to.
  */
-static inline struct vm_domain *
-vm_phys_domain(vm_page_t m)
+static inline int
+vm_phys_domidx(vm_page_t m)
 {
 #ifdef VM_NUMA_ALLOC
 	int domn, segind;
@@ -106,10 +108,22 @@ vm_phys_domain(vm_page_t m)
 	KASSERT(segind < vm_phys_nsegs, ("segind %d m %p", segind, m));
 	domn = vm_phys_segs[segind].domain;
 	KASSERT(domn < vm_ndomains, ("domain %d m %p", domn, m));
-	return (&vm_dom[domn]);
+	return (domn);
 #else
-	return (&vm_dom[0]);
+	return (0);
 #endif
+}
+
+/*
+ *	vm_phys_domain:
+ *
+ * 	Return the memory domain the page belongs to.
+ */
+static inline struct vm_domain *
+vm_phys_domain(vm_page_t m)
+{
+
+	return (&vm_dom[vm_phys_domidx(m)]);
 }
 
 static inline u_int

Modified: user/jeff/numa/sys/vm/vm_reserv.c
==============================================================================
--- user/jeff/numa/sys/vm/vm_reserv.c	Mon Nov 13 03:25:43 2017	(r325752)
+++ user/jeff/numa/sys/vm/vm_reserv.c	Mon Nov 13 03:34:55 2017	(r325753)
@@ -168,6 +168,7 @@ struct vm_reserv {
 	vm_object_t	object;			/* containing object */
 	vm_pindex_t	pindex;			/* offset within object */
 	vm_page_t	pages;			/* first page of a superpage */
+	int		domain;			/* NUMA domain */
 	int		popcnt;			/* # of pages in use */
 	char		inpartpopq;
 	popmap_t	popmap[NPOPMAP];	/* bit vector of used pages */
@@ -205,8 +206,7 @@ static vm_reserv_t vm_reserv_array;
  *
  * Access to this queue is synchronized by the free page queue lock.
  */
-static TAILQ_HEAD(, vm_reserv) vm_rvq_partpop =
-			    TAILQ_HEAD_INITIALIZER(vm_rvq_partpop);
+static TAILQ_HEAD(, vm_reserv) vm_rvq_partpop[MAXMEMDOM];
 
 static SYSCTL_NODE(_vm, OID_AUTO, reserv, CTLFLAG_RD, 0, "Reservation Info");
 
@@ -275,24 +275,27 @@ sysctl_vm_reserv_partpopq(SYSCTL_HANDLER_ARGS)
 {
 	struct sbuf sbuf;
 	vm_reserv_t rv;
-	int counter, error, level, unused_pages;
+	int counter, error, domain, level, unused_pages;
 
 	error = sysctl_wire_old_buffer(req, 0);
 	if (error != 0)
 		return (error);
 	sbuf_new_for_sysctl(&sbuf, NULL, 128, req);
-	sbuf_printf(&sbuf, "\nLEVEL     SIZE  NUMBER\n\n");
-	for (level = -1; level <= VM_NRESERVLEVEL - 2; level++) {
-		counter = 0;
-		unused_pages = 0;
-		mtx_lock(&vm_page_queue_free_mtx);
-		TAILQ_FOREACH(rv, &vm_rvq_partpop/*[level]*/, partpopq) {
-			counter++;
-			unused_pages += VM_LEVEL_0_NPAGES - rv->popcnt;
+	sbuf_printf(&sbuf, "\nDOMAIN    LEVEL     SIZE  NUMBER\n\n");
+	for (domain = 0; domain < vm_ndomains; domain++) {
+		for (level = -1; level <= VM_NRESERVLEVEL - 2; level++) {
+			counter = 0;
+			unused_pages = 0;
+			mtx_lock(&vm_page_queue_free_mtx);
+			TAILQ_FOREACH(rv, &vm_rvq_partpop[domain], partpopq) {
+				counter++;
+				unused_pages += VM_LEVEL_0_NPAGES - rv->popcnt;
+			}
+			mtx_unlock(&vm_page_queue_free_mtx);
+			sbuf_printf(&sbuf, "%6d, %7d, %6dK, %6d\n",
+			    domain, level,
+			    unused_pages * ((int)PAGE_SIZE / 1024), counter);
 		}
-		mtx_unlock(&vm_page_queue_free_mtx);
-		sbuf_printf(&sbuf, "%5d: %6dK, %6d\n", level,
-		    unused_pages * ((int)PAGE_SIZE / 1024), counter);
 	}
 	error = sbuf_finish(&sbuf);
 	sbuf_delete(&sbuf);
@@ -319,8 +322,11 @@ vm_reserv_depopulate(vm_reserv_t rv, int index)
 	    index));
 	KASSERT(rv->popcnt > 0,
 	    ("vm_reserv_depopulate: reserv %p's popcnt is corrupted", rv));
+	KASSERT(rv->domain >= 0 && rv->domain < vm_ndomains,
+	    ("vm_reserv_depopulate: reserv %p's domain is corrupted %d",
+	    rv, rv->domain));
 	if (rv->inpartpopq) {
-		TAILQ_REMOVE(&vm_rvq_partpop, rv, partpopq);
+		TAILQ_REMOVE(&vm_rvq_partpop[rv->domain], rv, partpopq);
 		rv->inpartpopq = FALSE;
 	} else {
 		KASSERT(rv->pages->psind == 1,
@@ -333,11 +339,12 @@ vm_reserv_depopulate(vm_reserv_t rv, int index)
 	if (rv->popcnt == 0) {
 		LIST_REMOVE(rv, objq);
 		rv->object = NULL;
+		rv->domain = -1;
 		vm_phys_free_pages(rv->pages, VM_LEVEL_0_ORDER);
 		vm_reserv_freed++;
 	} else {
 		rv->inpartpopq = TRUE;
-		TAILQ_INSERT_TAIL(&vm_rvq_partpop, rv, partpopq);
+		TAILQ_INSERT_TAIL(&vm_rvq_partpop[rv->domain], rv, partpopq);
 	}
 }
 
@@ -382,15 +389,18 @@ vm_reserv_populate(vm_reserv_t rv, int index)
 	    ("vm_reserv_populate: reserv %p is already full", rv));
 	KASSERT(rv->pages->psind == 0,
 	    ("vm_reserv_populate: reserv %p is already promoted", rv));
+	KASSERT(rv->domain >= 0 && rv->domain < vm_ndomains,
+	    ("vm_reserv_populate: reserv %p's domain is corrupted %d",
+	    rv, rv->domain));
 	if (rv->inpartpopq) {
-		TAILQ_REMOVE(&vm_rvq_partpop, rv, partpopq);
+		TAILQ_REMOVE(&vm_rvq_partpop[rv->domain], rv, partpopq);
 		rv->inpartpopq = FALSE;
 	}
 	popmap_set(rv->popmap, index);
 	rv->popcnt++;
 	if (rv->popcnt < VM_LEVEL_0_NPAGES) {
 		rv->inpartpopq = TRUE;
-		TAILQ_INSERT_TAIL(&vm_rvq_partpop, rv, partpopq);
+		TAILQ_INSERT_TAIL(&vm_rvq_partpop[rv->domain], rv, partpopq);
 	} else
 		rv->pages->psind = 1;
 }
@@ -411,9 +421,9 @@ vm_reserv_populate(vm_reserv_t rv, int index)
  * The object and free page queue must be locked.
  */
 vm_page_t
-vm_reserv_alloc_contig(vm_object_t object, vm_pindex_t pindex, u_long npages,
-    vm_paddr_t low, vm_paddr_t high, u_long alignment, vm_paddr_t boundary,
-    vm_page_t mpred)
+vm_reserv_alloc_contig(vm_object_t object, vm_pindex_t pindex, int domain,
+    u_long npages, vm_paddr_t low, vm_paddr_t high, u_long alignment,
+    vm_paddr_t boundary, vm_page_t mpred)
 {
 	vm_paddr_t pa, size;
 	vm_page_t m, m_ret, msucc;
@@ -533,7 +543,7 @@ vm_reserv_alloc_contig(vm_object_t object, vm_pindex_t
 	 * specified index may not be the first page within the first new
 	 * reservation.
 	 */
-	m = vm_phys_alloc_contig(allocpages, low, high, ulmax(alignment,
+	m = vm_phys_alloc_contig(domain, allocpages, low, high, ulmax(alignment,
 	    VM_LEVEL_0_SIZE), boundary > VM_LEVEL_0_SIZE ? boundary : 0);
 	if (m == NULL)
 		return (NULL);
@@ -556,6 +566,7 @@ vm_reserv_alloc_contig(vm_object_t object, vm_pindex_t
 		LIST_INSERT_HEAD(&object->rvq, rv, objq);
 		rv->object = object;
 		rv->pindex = first;
+		rv->domain = vm_phys_domidx(m);
 		KASSERT(rv->popcnt == 0,
 		    ("vm_reserv_alloc_contig: reserv %p's popcnt is corrupted",
 		    rv));
@@ -611,7 +622,8 @@ found:
  * The object and free page queue must be locked.
  */
 vm_page_t
-vm_reserv_alloc_page(vm_object_t object, vm_pindex_t pindex, vm_page_t mpred)
+vm_reserv_alloc_page(vm_object_t object, vm_pindex_t pindex, int domain,
+    vm_page_t mpred)
 {
 	vm_page_t m, msucc;
 	vm_pindex_t first, leftcap, rightcap;
@@ -690,7 +702,7 @@ vm_reserv_alloc_page(vm_object_t object, vm_pindex_t p
 	/*
 	 * Allocate and populate the new reservation.
 	 */
-	m = vm_phys_alloc_pages(VM_FREEPOOL_DEFAULT, VM_LEVEL_0_ORDER);
+	m = vm_phys_alloc_pages(domain, VM_FREEPOOL_DEFAULT, VM_LEVEL_0_ORDER);
 	if (m == NULL)
 		return (NULL);
 	rv = vm_reserv_from_page(m);
@@ -701,6 +713,7 @@ vm_reserv_alloc_page(vm_object_t object, vm_pindex_t p
 	LIST_INSERT_HEAD(&object->rvq, rv, objq);
 	rv->object = object;
 	rv->pindex = first;
+	rv->domain = vm_phys_domidx(m);
 	KASSERT(rv->popcnt == 0,
 	    ("vm_reserv_alloc_page: reserv %p's popcnt is corrupted", rv));
 	KASSERT(!rv->inpartpopq,
@@ -747,6 +760,7 @@ vm_reserv_break(vm_reserv_t rv, vm_page_t m)
 	    ("vm_reserv_break: reserv %p's inpartpopq is TRUE", rv));
 	LIST_REMOVE(rv, objq);
 	rv->object = NULL;
+	rv->domain = -1;
 	if (m != NULL) {
 		/*
 		 * Since the reservation is being broken, there is no harm in
@@ -816,7 +830,7 @@ vm_reserv_break_all(vm_object_t object)
 		KASSERT(rv->object == object,
 		    ("vm_reserv_break_all: reserv %p is corrupted", rv));
 		if (rv->inpartpopq) {
-			TAILQ_REMOVE(&vm_rvq_partpop, rv, partpopq);
+			TAILQ_REMOVE(&vm_rvq_partpop[rv->domain], rv, partpopq);
 			rv->inpartpopq = FALSE;
 		}
 		vm_reserv_break(rv, NULL);
@@ -854,7 +868,7 @@ vm_reserv_init(void)
 {
 	vm_paddr_t paddr;
 	struct vm_phys_seg *seg;
-	int segind;
+	int i, segind;
 
 	/*
 	 * Initialize the reservation array.  Specifically, initialize the
@@ -869,6 +883,8 @@ vm_reserv_init(void)
 			paddr += VM_LEVEL_0_SIZE;
 		}
 	}
+	for (i = 0; i < MAXMEMDOM; i++)
+		TAILQ_INIT(&vm_rvq_partpop[i]);
 }
 
 /*
@@ -926,7 +942,10 @@ vm_reserv_reclaim(vm_reserv_t rv)
 	mtx_assert(&vm_page_queue_free_mtx, MA_OWNED);
 	KASSERT(rv->inpartpopq,
 	    ("vm_reserv_reclaim: reserv %p's inpartpopq is FALSE", rv));
-	TAILQ_REMOVE(&vm_rvq_partpop, rv, partpopq);
+	KASSERT(rv->domain >= 0 && rv->domain < vm_ndomains,
+	    ("vm_reserv_reclaim: reserv %p's domain is corrupted %d",
+	    rv, rv->domain));
+	TAILQ_REMOVE(&vm_rvq_partpop[rv->domain], rv, partpopq);
 	rv->inpartpopq = FALSE;
 	vm_reserv_break(rv, NULL);
 	vm_reserv_reclaimed++;
@@ -940,12 +959,12 @@ vm_reserv_reclaim(vm_reserv_t rv)
  * The free page queue lock must be held.
  */
 boolean_t
-vm_reserv_reclaim_inactive(void)
+vm_reserv_reclaim_inactive(int domain)
 {
 	vm_reserv_t rv;
 
 	mtx_assert(&vm_page_queue_free_mtx, MA_OWNED);
-	if ((rv = TAILQ_FIRST(&vm_rvq_partpop)) != NULL) {
+	if ((rv = TAILQ_FIRST(&vm_rvq_partpop[domain])) != NULL) {
 		vm_reserv_reclaim(rv);
 		return (TRUE);
 	}
@@ -961,8 +980,8 @@ vm_reserv_reclaim_inactive(void)
  * The free page queue lock must be held.
  */
 boolean_t
-vm_reserv_reclaim_contig(u_long npages, vm_paddr_t low, vm_paddr_t high,
-    u_long alignment, vm_paddr_t boundary)
+vm_reserv_reclaim_contig(int domain, u_long npages, vm_paddr_t low,
+    vm_paddr_t high, u_long alignment, vm_paddr_t boundary)
 {
 	vm_paddr_t pa, size;

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***

From owner-svn-src-user@freebsd.org  Mon Nov 13 03:41:52 2017
Return-Path: <owner-svn-src-user@freebsd.org>
Delivered-To: svn-src-user@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8C704CFE059
 for <svn-src-user@mailman.ysv.freebsd.org>;
 Mon, 13 Nov 2017 03:41:52 +0000 (UTC)
 (envelope-from jeff@FreeBSD.org)
Received: from repo.freebsd.org (repo.freebsd.org
 [IPv6:2610:1c1:1:6068::e6a:0])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 270B3801DE;
 Mon, 13 Nov 2017 03:41:52 +0000 (UTC)
 (envelope-from jeff@FreeBSD.org)
Received: from repo.freebsd.org ([127.0.1.37])
 by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id vAD3fp4c097660;
 Mon, 13 Nov 2017 03:41:51 GMT (envelope-from jeff@FreeBSD.org)
Received: (from jeff@localhost)
 by repo.freebsd.org (8.15.2/8.15.2/Submit) id vAD3foUw097650;
 Mon, 13 Nov 2017 03:41:50 GMT (envelope-from jeff@FreeBSD.org)
Message-Id: <201711130341.vAD3foUw097650@repo.freebsd.org>
X-Authentication-Warning: repo.freebsd.org: jeff set sender to
 jeff@FreeBSD.org using -f
From: Jeff Roberson <jeff@FreeBSD.org>
Date: Mon, 13 Nov 2017 03:41:50 +0000 (UTC)
To: src-committers@freebsd.org, svn-src-user@freebsd.org
Subject: svn commit: r325754 - in user/jeff/numa/sys: kern sys vm
X-SVN-Group: user
X-SVN-Commit-Author: jeff
X-SVN-Commit-Paths: in user/jeff/numa/sys: kern sys vm
X-SVN-Commit-Revision: 325754
X-SVN-Commit-Repository: base
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-BeenThere: svn-src-user@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: "SVN commit messages for the experimental &quot; user&quot;
 src tree" <svn-src-user.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-user/>
List-Post: <mailto:svn-src-user@freebsd.org>
List-Help: <mailto:svn-src-user-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Nov 2017 03:41:52 -0000

Author: jeff
Date: Mon Nov 13 03:41:50 2017
New Revision: 325754
URL: https://svnweb.freebsd.org/changeset/base/325754

Log:
  Eliminate kmem_arena to simplify the kmem_ api for forthcoming NUMA support

Modified:
  user/jeff/numa/sys/kern/kern_malloc.c
  user/jeff/numa/sys/kern/subr_vmem.c
  user/jeff/numa/sys/sys/vmem.h
  user/jeff/numa/sys/vm/memguard.c
  user/jeff/numa/sys/vm/uma.h
  user/jeff/numa/sys/vm/uma_core.c
  user/jeff/numa/sys/vm/vm_kern.c
  user/jeff/numa/sys/vm/vm_map.c
  user/jeff/numa/sys/vm/vm_object.c
  user/jeff/numa/sys/vm/vm_object.h

Modified: user/jeff/numa/sys/kern/kern_malloc.c
==============================================================================
--- user/jeff/numa/sys/kern/kern_malloc.c	Mon Nov 13 03:34:55 2017	(r325753)
+++ user/jeff/numa/sys/kern/kern_malloc.c	Mon Nov 13 03:41:50 2017	(r325754)
@@ -237,7 +237,7 @@ sysctl_kmem_map_size(SYSCTL_HANDLER_ARGS)
 {
 	u_long size;
 
-	size = vmem_size(kmem_arena, VMEM_ALLOC);
+	size = vmem_size(kernel_arena, VMEM_ALLOC);
 	return (sysctl_handle_long(oidp, &size, 0, req));
 }
 
@@ -246,7 +246,7 @@ sysctl_kmem_map_free(SYSCTL_HANDLER_ARGS)
 {
 	u_long size;
 
-	size = vmem_size(kmem_arena, VMEM_FREE);
+	size = vmem_size(kernel_arena, VMEM_FREE);
 	return (sysctl_handle_long(oidp, &size, 0, req));
 }
 
@@ -757,9 +757,8 @@ kmeminit(void)
 #else
 	tmp = vm_kmem_size;
 #endif
-	vmem_init(kmem_arena, "kmem arena", kva_alloc(tmp), tmp, PAGE_SIZE,
-	    0, 0);
-	vmem_set_reclaim(kmem_arena, kmem_reclaim);
+	vmem_set_limit(kernel_arena, tmp);
+	vmem_set_reclaim(kernel_arena, kmem_reclaim);
 
 #ifdef DEBUG_MEMGUARD
 	/*
@@ -767,7 +766,7 @@ kmeminit(void)
 	 * replacement allocator used for detecting tamper-after-free
 	 * scenarios as they occur.  It is only used for debugging.
 	 */
-	memguard_init(kmem_arena);
+	memguard_init(kernel_arena);
 #endif
 }
 

Modified: user/jeff/numa/sys/kern/subr_vmem.c
==============================================================================
--- user/jeff/numa/sys/kern/subr_vmem.c	Mon Nov 13 03:34:55 2017	(r325753)
+++ user/jeff/numa/sys/kern/subr_vmem.c	Mon Nov 13 03:41:50 2017	(r325754)
@@ -135,6 +135,7 @@ struct vmem {
 	int			vm_nbusytag;
 	vmem_size_t		vm_inuse;
 	vmem_size_t		vm_size;
+	vmem_size_t		vm_limit;
 
 	/* Used on import. */
 	vmem_import_t		*vm_importfn;
@@ -226,11 +227,11 @@ static uma_zone_t vmem_bt_zone;
 
 /* boot time arena storage. */
 static struct vmem kernel_arena_storage;
-static struct vmem kmem_arena_storage;
 static struct vmem buffer_arena_storage;
 static struct vmem transient_arena_storage;
+/* kernel and kmem arenas are aliased for backwards KPI compat. */
 vmem_t *kernel_arena = &kernel_arena_storage;
-vmem_t *kmem_arena = &kmem_arena_storage;
+vmem_t *kmem_arena = &kernel_arena_storage;
 vmem_t *buffer_arena = &buffer_arena_storage;
 vmem_t *transient_arena = &transient_arena_storage;
 
@@ -252,11 +253,11 @@ bt_fill(vmem_t *vm, int flags)
 	VMEM_ASSERT_LOCKED(vm);
 
 	/*
-	 * Only allow the kmem arena to dip into reserve tags.  It is the
+	 * Only allow the kernel arena to dip into reserve tags.  It is the
 	 * vmem where new tags come from.
 	 */
 	flags &= BT_FLAGS;
-	if (vm != kmem_arena)
+	if (vm != kernel_arena)
 		flags &= ~M_USE_RESERVE;
 
 	/*
@@ -613,22 +614,22 @@ vmem_bt_alloc(uma_zone_t zone, vm_size_t bytes, uint8_
 {
 	vmem_addr_t addr;
 
-	*pflag = UMA_SLAB_KMEM;
+	*pflag = UMA_SLAB_KERNEL;
 
 	/*
 	 * Single thread boundary tag allocation so that the address space
 	 * and memory are added in one atomic operation.
 	 */
 	mtx_lock(&vmem_bt_lock);
-	if (vmem_xalloc(kmem_arena, bytes, 0, 0, 0, VMEM_ADDR_MIN,
+	if (vmem_xalloc(kernel_arena, bytes, 0, 0, 0, VMEM_ADDR_MIN,
 	    VMEM_ADDR_MAX, M_NOWAIT | M_NOVM | M_USE_RESERVE | M_BESTFIT,
 	    &addr) == 0) {
-		if (kmem_back(kmem_object, addr, bytes,
+		if (kmem_back(kernel_object, addr, bytes,
 		    M_NOWAIT | M_USE_RESERVE) == 0) {
 			mtx_unlock(&vmem_bt_lock);
 			return ((void *)addr);
 		}
-		vmem_xfree(kmem_arena, addr, bytes);
+		vmem_xfree(kernel_arena, addr, bytes);
 		mtx_unlock(&vmem_bt_lock);
 		/*
 		 * Out of memory, not address space.  This may not even be
@@ -832,6 +833,9 @@ vmem_import(vmem_t *vm, vmem_size_t size, vmem_size_t 
 	vmem_addr_t addr;
 	int error;
 
+	if (vm->vm_limit != 0 && vm->vm_limit < vm->vm_size + size)
+		return ENOMEM;
+
 	if (vm->vm_importfn == NULL)
 		return EINVAL;
 
@@ -976,6 +980,15 @@ vmem_set_import(vmem_t *vm, vmem_import_t *importfn,
 }
 
 void
+vmem_set_limit(vmem_t *vm, vmem_size_t limit)
+{
+
+	VMEM_LOCK(vm);
+	vm->vm_limit = limit;
+	VMEM_UNLOCK(vm);
+}
+
+void
 vmem_set_reclaim(vmem_t *vm, vmem_reclaim_t *reclaimfn)
 {
 
@@ -1007,6 +1020,7 @@ vmem_init(vmem_t *vm, const char *name, vmem_addr_t ba
 	vm->vm_quantum_shift = flsl(quantum) - 1;
 	vm->vm_nbusytag = 0;
 	vm->vm_size = 0;
+	vm->vm_limit = 0;
 	vm->vm_inuse = 0;
 	qc_init(vm, qcache_max);
 

Modified: user/jeff/numa/sys/sys/vmem.h
==============================================================================
--- user/jeff/numa/sys/sys/vmem.h	Mon Nov 13 03:34:55 2017	(r325753)
+++ user/jeff/numa/sys/sys/vmem.h	Mon Nov 13 03:41:50 2017	(r325754)
@@ -74,6 +74,12 @@ void vmem_set_import(vmem_t *vm, vmem_import_t *import
     vmem_release_t *releasefn, void *arg, vmem_size_t import_quantum);
 
 /*
+ * Set a limit on the total size of a vmem.
+ */
+
+void vmem_set_limit(vmem_t *vm, vmem_size_t limit);
+
+/*
  * Set a callback for reclaiming memory when space is exhausted:
  */
 void vmem_set_reclaim(vmem_t *vm, vmem_reclaim_t *reclaimfn);

Modified: user/jeff/numa/sys/vm/memguard.c
==============================================================================
--- user/jeff/numa/sys/vm/memguard.c	Mon Nov 13 03:34:55 2017	(r325753)
+++ user/jeff/numa/sys/vm/memguard.c	Mon Nov 13 03:41:50 2017	(r325754)
@@ -64,7 +64,7 @@ __FBSDID("$FreeBSD$");
 
 static SYSCTL_NODE(_vm, OID_AUTO, memguard, CTLFLAG_RW, NULL, "MemGuard data");
 /*
- * The vm_memguard_divisor variable controls how much of kmem_map should be
+ * The vm_memguard_divisor variable controls how much of kernel_arena should be
  * reserved for MemGuard.
  */
 static u_int vm_memguard_divisor;
@@ -155,7 +155,7 @@ SYSCTL_ULONG(_vm_memguard, OID_AUTO, frequency_hits, C
 
 /*
  * Return a fudged value to be used for vm_kmem_size for allocating
- * the kmem_map.  The memguard memory will be a submap.
+ * the kernel_arena.  The memguard memory will be a submap.
  */
 unsigned long
 memguard_fudge(unsigned long km_size, const struct vm_map *parent_map)
@@ -346,7 +346,7 @@ memguard_alloc(unsigned long req_size, int flags)
 	addr = origaddr;
 	if (do_guard)
 		addr += PAGE_SIZE;
-	rv = kmem_back(kmem_object, addr, size_p, flags);
+	rv = kmem_back(kernel_object, addr, size_p, flags);
 	if (rv != KERN_SUCCESS) {
 		vmem_xfree(memguard_arena, origaddr, size_v);
 		memguard_fail_pgs++;
@@ -416,7 +416,7 @@ memguard_free(void *ptr)
 	 * vm_map lock to serialize updates to memguard_wasted, since
 	 * we had the lock at increment.
 	 */
-	kmem_unback(kmem_object, addr, size);
+	kmem_unback(kernel_object, addr, size);
 	if (sizev > size)
 		addr -= PAGE_SIZE;
 	vmem_xfree(memguard_arena, addr, sizev);

Modified: user/jeff/numa/sys/vm/uma.h
==============================================================================
--- user/jeff/numa/sys/vm/uma.h	Mon Nov 13 03:34:55 2017	(r325753)
+++ user/jeff/numa/sys/vm/uma.h	Mon Nov 13 03:41:50 2017	(r325754)
@@ -607,12 +607,11 @@ void uma_zone_set_freef(uma_zone_t zone, uma_free free
  * These flags are setable in the allocf and visible in the freef.
  */
 #define UMA_SLAB_BOOT	0x01		/* Slab alloced from boot pages */
-#define UMA_SLAB_KMEM	0x02		/* Slab alloced from kmem_map */
 #define UMA_SLAB_KERNEL	0x04		/* Slab alloced from kernel_map */
 #define UMA_SLAB_PRIV	0x08		/* Slab alloced from priv allocator */
 #define UMA_SLAB_OFFP	0x10		/* Slab is managed separately  */
 #define UMA_SLAB_MALLOC	0x20		/* Slab is a large malloc slab */
-/* 0x40 and 0x80 are available */
+/* 0x02, 0x40 and 0x80 are available */
 
 /*
  * Used to pre-fill a zone with some number of items

Modified: user/jeff/numa/sys/vm/uma_core.c
==============================================================================
--- user/jeff/numa/sys/vm/uma_core.c	Mon Nov 13 03:34:55 2017	(r325753)
+++ user/jeff/numa/sys/vm/uma_core.c	Mon Nov 13 03:41:50 2017	(r325754)
@@ -1077,8 +1077,8 @@ page_alloc(uma_zone_t zone, vm_size_t bytes, uint8_t *
 {
 	void *p;	/* Returned page */
 
-	*pflag = UMA_SLAB_KMEM;
-	p = (void *) kmem_malloc(kmem_arena, bytes, wait);
+	*pflag = UMA_SLAB_KERNEL;
+	p = (void *) kmem_malloc(kernel_arena, bytes, wait);
 
 	return (p);
 }
@@ -1159,9 +1159,7 @@ page_free(void *mem, vm_size_t size, uint8_t flags)
 {
 	struct vmem *vmem;
 
-	if (flags & UMA_SLAB_KMEM)
-		vmem = kmem_arena;
-	else if (flags & UMA_SLAB_KERNEL)
+	if (flags & UMA_SLAB_KERNEL)
 		vmem = kernel_arena;
 	else
 		panic("UMA: page_free used with invalid flags %x", flags);

Modified: user/jeff/numa/sys/vm/vm_kern.c
==============================================================================
--- user/jeff/numa/sys/vm/vm_kern.c	Mon Nov 13 03:34:55 2017	(r325753)
+++ user/jeff/numa/sys/vm/vm_kern.c	Mon Nov 13 03:41:50 2017	(r325754)
@@ -162,11 +162,13 @@ vm_offset_t
 kmem_alloc_attr(vmem_t *vmem, vm_size_t size, int flags, vm_paddr_t low,
     vm_paddr_t high, vm_memattr_t memattr)
 {
-	vm_object_t object = vmem == kmem_arena ? kmem_object : kernel_object;
+	vm_object_t object = kernel_object;
 	vm_offset_t addr, i, offset;
 	vm_page_t m;
 	int pflags, tries;
 
+	KASSERT(vmem == kernel_arena,
+	    ("kmem_alloc_attr: Only kernel_arena is supported."));
 	size = round_page(size);
 	if (vmem_alloc(vmem, size, M_BESTFIT | flags, &addr))
 		return (0);
@@ -218,12 +220,14 @@ kmem_alloc_contig(struct vmem *vmem, vm_size_t size, i
     vm_paddr_t high, u_long alignment, vm_paddr_t boundary,
     vm_memattr_t memattr)
 {
-	vm_object_t object = vmem == kmem_arena ? kmem_object : kernel_object;
+	vm_object_t object = kernel_object;
 	vm_offset_t addr, offset, tmp;
 	vm_page_t end_m, m;
 	u_long npages;
 	int pflags, tries;
  
+	KASSERT(vmem == kernel_arena,
+	    ("kmem_alloc_contig: Only kernel_arena is supported."));
 	size = round_page(size);
 	if (vmem_alloc(vmem, size, flags | M_BESTFIT, &addr))
 		return (0);
@@ -312,12 +316,13 @@ kmem_malloc(struct vmem *vmem, vm_size_t size, int fla
 	vm_offset_t addr;
 	int rv;
 
+	KASSERT(vmem == kernel_arena,
+	    ("kmem_malloc: Only kernel_arena is supported."));
 	size = round_page(size);
 	if (vmem_alloc(vmem, size, flags | M_BESTFIT, &addr))
 		return (0);
 
-	rv = kmem_back((vmem == kmem_arena) ? kmem_object : kernel_object,
-	    addr, size, flags);
+	rv = kmem_back(kernel_object, addr, size, flags);
 	if (rv != KERN_SUCCESS) {
 		vmem_free(vmem, addr, size);
 		return (0);
@@ -337,8 +342,8 @@ kmem_back(vm_object_t object, vm_offset_t addr, vm_siz
 	vm_page_t m, mpred;
 	int pflags;
 
-	KASSERT(object == kmem_object || object == kernel_object,
-	    ("kmem_back: only supports kernel objects."));
+	KASSERT(object == kernel_object,
+	    ("kmem_back: only supports kernel object."));
 
 	offset = addr - VM_MIN_KERNEL_ADDRESS;
 	pflags = malloc2vm_flags(flags) | VM_ALLOC_NOBUSY | VM_ALLOC_WIRED;
@@ -394,8 +399,8 @@ kmem_unback(vm_object_t object, vm_offset_t addr, vm_s
 	vm_page_t m, next;
 	vm_offset_t end, offset;
 
-	KASSERT(object == kmem_object || object == kernel_object,
-	    ("kmem_unback: only supports kernel objects."));
+	KASSERT(object == kernel_object,
+	    ("kmem_unback: only supports kernel object."));
 
 	pmap_remove(kernel_pmap, addr, addr + size);
 	offset = addr - VM_MIN_KERNEL_ADDRESS;
@@ -420,9 +425,10 @@ void
 kmem_free(struct vmem *vmem, vm_offset_t addr, vm_size_t size)
 {
 
+	KASSERT(vmem == kernel_arena,
+	    ("kmem_free: Only kernel_arena is supported."));
 	size = round_page(size);
-	kmem_unback((vmem == kmem_arena) ? kmem_object : kernel_object,
-	    addr, size);
+	kmem_unback(kernel_object, addr, size);
 	vmem_free(vmem, addr, size);
 }
 

Modified: user/jeff/numa/sys/vm/vm_map.c
==============================================================================
--- user/jeff/numa/sys/vm/vm_map.c	Mon Nov 13 03:34:55 2017	(r325753)
+++ user/jeff/numa/sys/vm/vm_map.c	Mon Nov 13 03:41:50 2017	(r325754)
@@ -1187,9 +1187,9 @@ vm_map_insert(vm_map_t map, vm_object_t object, vm_oof
 	vm_inherit_t inheritance;
 
 	VM_MAP_ASSERT_LOCKED(map);
-	KASSERT((object != kmem_object && object != kernel_object) ||
+	KASSERT(object != kernel_object ||
 	    (cow & MAP_COPY_ON_WRITE) == 0,
-	    ("vm_map_insert: kmem or kernel object and COW"));
+	    ("vm_map_insert: kernel object and COW"));
 	KASSERT(object == NULL || (cow & MAP_NOFAULT) == 0,
 	    ("vm_map_insert: paradoxical MAP_NOFAULT request"));
 	KASSERT((prot & ~max) == 0,
@@ -2988,7 +2988,7 @@ vm_map_entry_delete(vm_map_t map, vm_map_entry_t entry
 		VM_OBJECT_WLOCK(object);
 		if (object->ref_count != 1 && ((object->flags & (OBJ_NOSPLIT |
 		    OBJ_ONEMAPPING)) == OBJ_ONEMAPPING ||
-		    object == kernel_object || object == kmem_object)) {
+		    object == kernel_object)) {
 			vm_object_collapse(object);
 
 			/*

Modified: user/jeff/numa/sys/vm/vm_object.c
==============================================================================
--- user/jeff/numa/sys/vm/vm_object.c	Mon Nov 13 03:34:55 2017	(r325753)
+++ user/jeff/numa/sys/vm/vm_object.c	Mon Nov 13 03:41:50 2017	(r325754)
@@ -142,7 +142,6 @@ struct object_q vm_object_list;
 struct mtx vm_object_list_mtx;	/* lock for object list and count */
 
 struct vm_object kernel_object_store;
-struct vm_object kmem_object_store;
 
 static SYSCTL_NODE(_vm_stats, OID_AUTO, object, CTLFLAG_RD, 0,
     "VM object stats");
@@ -290,14 +289,6 @@ vm_object_init(void)
 #if VM_NRESERVLEVEL > 0
 	kernel_object->flags |= OBJ_COLORED;
 	kernel_object->pg_color = (u_short)atop(VM_MIN_KERNEL_ADDRESS);
-#endif
-
-	rw_init(&kmem_object->lock, "kmem vm object");
-	_vm_object_allocate(OBJT_PHYS, atop(VM_MAX_KERNEL_ADDRESS -
-	    VM_MIN_KERNEL_ADDRESS), kmem_object);
-#if VM_NRESERVLEVEL > 0
-	kmem_object->flags |= OBJ_COLORED;
-	kmem_object->pg_color = (u_short)atop(VM_MIN_KERNEL_ADDRESS);
 #endif
 
 	/*

Modified: user/jeff/numa/sys/vm/vm_object.h
==============================================================================
--- user/jeff/numa/sys/vm/vm_object.h	Mon Nov 13 03:34:55 2017	(r325753)
+++ user/jeff/numa/sys/vm/vm_object.h	Mon Nov 13 03:41:50 2017	(r325754)
@@ -225,10 +225,10 @@ extern struct object_q vm_object_list;	/* list of allo
 extern struct mtx vm_object_list_mtx;	/* lock for object list and count */
 
 extern struct vm_object kernel_object_store;
-extern struct vm_object kmem_object_store;
 
+/* kernel and kmem are aliased for backwards KPI compat. */
 #define	kernel_object	(&kernel_object_store)
-#define	kmem_object	(&kmem_object_store)
+#define	kmem_object	(&kernel_object_store)
 
 #define	VM_OBJECT_ASSERT_LOCKED(object)					\
 	rw_assert(&(object)->lock, RA_LOCKED)

From owner-svn-src-user@freebsd.org  Mon Nov 13 23:33:09 2017
Return-Path: <owner-svn-src-user@freebsd.org>
Delivered-To: svn-src-user@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5D156DDA066
 for <svn-src-user@mailman.ysv.freebsd.org>;
 Mon, 13 Nov 2017 23:33:09 +0000 (UTC)
 (envelope-from jeff@FreeBSD.org)
Received: from repo.freebsd.org (repo.freebsd.org
 [IPv6:2610:1c1:1:6068::e6a:0])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3716E3637;
 Mon, 13 Nov 2017 23:33:09 +0000 (UTC)
 (envelope-from jeff@FreeBSD.org)
Received: from repo.freebsd.org ([127.0.1.37])
 by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id vADNX8mO004146;
 Mon, 13 Nov 2017 23:33:08 GMT (envelope-from jeff@FreeBSD.org)
Received: (from jeff@localhost)
 by repo.freebsd.org (8.15.2/8.15.2/Submit) id vADNX8XN004142;
 Mon, 13 Nov 2017 23:33:08 GMT (envelope-from jeff@FreeBSD.org)
Message-Id: <201711132333.vADNX8XN004142@repo.freebsd.org>
X-Authentication-Warning: repo.freebsd.org: jeff set sender to
 jeff@FreeBSD.org using -f
From: Jeff Roberson <jeff@FreeBSD.org>
Date: Mon, 13 Nov 2017 23:33:08 +0000 (UTC)
To: src-committers@freebsd.org, svn-src-user@freebsd.org
Subject: svn commit: r325784 - in user/jeff/numa/sys: kern vm
X-SVN-Group: user
X-SVN-Commit-Author: jeff
X-SVN-Commit-Paths: in user/jeff/numa/sys: kern vm
X-SVN-Commit-Revision: 325784
X-SVN-Commit-Repository: base
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-BeenThere: svn-src-user@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: "SVN commit messages for the experimental &quot; user&quot;
 src tree" <svn-src-user.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-user/>
List-Post: <mailto:svn-src-user@freebsd.org>
List-Help: <mailto:svn-src-user-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Nov 2017 23:33:09 -0000

Author: jeff
Date: Mon Nov 13 23:33:07 2017
New Revision: 325784
URL: https://svnweb.freebsd.org/changeset/base/325784

Log:
  Use a soft limit for kmem implemented within uma.  Part of r325754

Modified:
  user/jeff/numa/sys/kern/kern_malloc.c
  user/jeff/numa/sys/kern/subr_vmem.c
  user/jeff/numa/sys/vm/uma_core.c
  user/jeff/numa/sys/vm/uma_int.h

Modified: user/jeff/numa/sys/kern/kern_malloc.c
==============================================================================
--- user/jeff/numa/sys/kern/kern_malloc.c	Mon Nov 13 23:21:17 2017	(r325783)
+++ user/jeff/numa/sys/kern/kern_malloc.c	Mon Nov 13 23:33:07 2017	(r325784)
@@ -237,16 +237,22 @@ sysctl_kmem_map_size(SYSCTL_HANDLER_ARGS)
 {
 	u_long size;
 
-	size = vmem_size(kernel_arena, VMEM_ALLOC);
+	size = uma_size();
 	return (sysctl_handle_long(oidp, &size, 0, req));
 }
 
 static int
 sysctl_kmem_map_free(SYSCTL_HANDLER_ARGS)
 {
-	u_long size;
+	u_long size, limit;
 
-	size = vmem_size(kernel_arena, VMEM_FREE);
+	/* The sysctl is unsigned, implement as a saturation value. */
+	size = uma_size();
+	limit = uma_limit();
+	if (size > limit)
+		size = 0;
+	else
+		size = limit - size;
 	return (sysctl_handle_long(oidp, &size, 0, req));
 }
 
@@ -667,19 +673,6 @@ reallocf(void *addr, unsigned long size, struct malloc
 	return (mem);
 }
 
-/*
- * Wake the uma reclamation pagedaemon thread when we exhaust KVA.  It
- * will call the lowmem handler and uma_reclaim() callbacks in a
- * context that is safe.
- */
-static void
-kmem_reclaim(vmem_t *vm, int flags)
-{
-
-	uma_reclaim_wakeup();
-	pagedaemon_wakeup();
-}
-
 #ifndef __sparc64__
 CTASSERT(VM_KMEM_SIZE_SCALE >= 1);
 #endif
@@ -757,8 +750,7 @@ kmeminit(void)
 #else
 	tmp = vm_kmem_size;
 #endif
-	vmem_set_limit(kernel_arena, tmp);
-	vmem_set_reclaim(kernel_arena, kmem_reclaim);
+	uma_set_limit(tmp);
 
 #ifdef DEBUG_MEMGUARD
 	/*

Modified: user/jeff/numa/sys/kern/subr_vmem.c
==============================================================================
--- user/jeff/numa/sys/kern/subr_vmem.c	Mon Nov 13 23:21:17 2017	(r325783)
+++ user/jeff/numa/sys/kern/subr_vmem.c	Mon Nov 13 23:33:07 2017	(r325784)
@@ -833,9 +833,6 @@ vmem_import(vmem_t *vm, vmem_size_t size, vmem_size_t 
 	vmem_addr_t addr;
 	int error;
 
-	if (vm->vm_limit != 0 && vm->vm_limit < vm->vm_size + size)
-		return ENOMEM;
-
 	if (vm->vm_importfn == NULL)
 		return EINVAL;
 
@@ -846,6 +843,9 @@ vmem_import(vmem_t *vm, vmem_size_t size, vmem_size_t 
 	if (align != vm->vm_quantum_mask + 1)
 		size = (align * 2) + size;
 	size = roundup(size, vm->vm_import_quantum);
+
+	if (vm->vm_limit != 0 && vm->vm_limit < vm->vm_size + size)
+		return ENOMEM;
 
 	/*
 	 * Hide MAXALLOC tags so we're guaranteed to be able to add this

Modified: user/jeff/numa/sys/vm/uma_core.c
==============================================================================
--- user/jeff/numa/sys/vm/uma_core.c	Mon Nov 13 23:21:17 2017	(r325783)
+++ user/jeff/numa/sys/vm/uma_core.c	Mon Nov 13 23:33:07 2017	(r325784)
@@ -145,6 +145,10 @@ static struct mtx uma_boot_pages_mtx;
 
 static struct sx uma_drain_lock;
 
+/* kmem soft limit. */
+static unsigned long uma_kmem_limit;
+static volatile unsigned long uma_kmem_total;
+
 /* Is the VM done starting up? */
 static int booted = 0;
 #define	UMA_STARTUP	1
@@ -283,6 +287,22 @@ static int zone_warnings = 1;
 SYSCTL_INT(_vm, OID_AUTO, zone_warnings, CTLFLAG_RWTUN, &zone_warnings, 0,
     "Warn when UMA zones becomes full");
 
+/* Adjust bytes under management by UMA. */
+static inline void
+uma_total_dec(unsigned long size)
+{
+
+	atomic_subtract_long(&uma_kmem_total, size);
+}
+
+static inline void
+uma_total_inc(unsigned long size)
+{
+
+	if (atomic_fetchadd_long(&uma_kmem_total, size) > uma_kmem_limit)
+		uma_reclaim_wakeup();
+}
+
 /*
  * This routine checks to see whether or not it's safe to enable buckets.
  */
@@ -829,6 +849,7 @@ keg_free_slab(uma_keg_t keg, uma_slab_t slab, int star
 	if (keg->uk_flags & UMA_ZONE_OFFPAGE)
 		zone_free_item(keg->uk_slabzone, slab, NULL, SKIP_NONE);
 	keg->uk_freef(mem, PAGE_SIZE * keg->uk_ppera, flags);
+	uma_total_dec(PAGE_SIZE * keg->uk_ppera);
 }
 
 /*
@@ -933,6 +954,7 @@ keg_alloc_slab(uma_keg_t keg, uma_zone_t zone, int wai
 {
 	uma_alloc allocf;
 	uma_slab_t slab;
+	unsigned long size;
 	uint8_t *mem;
 	uint8_t flags;
 	int i;
@@ -943,6 +965,7 @@ keg_alloc_slab(uma_keg_t keg, uma_zone_t zone, int wai
 
 	allocf = keg->uk_allocf;
 	KEG_UNLOCK(keg);
+	size = keg->uk_ppera * PAGE_SIZE;
 
 	if (keg->uk_flags & UMA_ZONE_OFFPAGE) {
 		slab = zone_alloc_item(keg->uk_slabzone, NULL, wait);
@@ -966,13 +989,14 @@ keg_alloc_slab(uma_keg_t keg, uma_zone_t zone, int wai
 		wait |= M_NODUMP;
 
 	/* zone is passed for legacy reasons. */
-	mem = allocf(zone, keg->uk_ppera * PAGE_SIZE, &flags, wait);
+	mem = allocf(zone, size, &flags, wait);
 	if (mem == NULL) {
 		if (keg->uk_flags & UMA_ZONE_OFFPAGE)
 			zone_free_item(keg->uk_slabzone, slab, NULL, SKIP_NONE);
 		slab = NULL;
 		goto out;
 	}
+	uma_total_inc(size);
 
 	/* Point the slab into the allocated memory */
 	if (!(keg->uk_flags & UMA_ZONE_OFFPAGE))
@@ -3128,14 +3152,14 @@ uma_reclaim(void)
 	sx_xunlock(&uma_drain_lock);
 }
 
-static int uma_reclaim_needed;
+static volatile int uma_reclaim_needed;
 
 void
 uma_reclaim_wakeup(void)
 {
 
-	uma_reclaim_needed = 1;
-	wakeup(&uma_reclaim_needed);
+	if (atomic_fetchadd_int(&uma_reclaim_needed, 1) == 0)
+		wakeup(uma_reclaim);
 }
 
 void
@@ -3144,14 +3168,13 @@ uma_reclaim_worker(void *arg __unused)
 
 	sx_xlock(&uma_drain_lock);
 	for (;;) {
-		sx_sleep(&uma_reclaim_needed, &uma_drain_lock, PVM,
-		    "umarcl", 0);
+		sx_sleep(uma_reclaim, &uma_drain_lock, PVM, "umarcl", 0);
 		if (uma_reclaim_needed) {
-			uma_reclaim_needed = 0;
 			sx_xunlock(&uma_drain_lock);
 			EVENTHANDLER_INVOKE(vm_lowmem, VM_LOW_KMEM);
 			sx_xlock(&uma_drain_lock);
 			uma_reclaim_locked(true);
+			atomic_set_int(&uma_reclaim_needed, 0);
 		}
 	}
 }
@@ -3215,6 +3238,27 @@ uma_zero_item(void *item, uma_zone_t zone)
 			bzero(zpcpu_get_cpu(item, i), zone->uz_size);
 	} else
 		bzero(item, zone->uz_size);
+}
+
+unsigned long
+uma_limit(void)
+{
+
+	return uma_kmem_limit;
+}
+
+void
+uma_set_limit(unsigned long limit)
+{
+	uma_kmem_limit = limit;
+}
+
+
+unsigned long
+uma_size(void)
+{
+
+	return uma_kmem_total;
 }
 
 void

Modified: user/jeff/numa/sys/vm/uma_int.h
==============================================================================
--- user/jeff/numa/sys/vm/uma_int.h	Mon Nov 13 23:21:17 2017	(r325783)
+++ user/jeff/numa/sys/vm/uma_int.h	Mon Nov 13 23:33:07 2017	(r325784)
@@ -423,6 +423,13 @@ vsetslab(vm_offset_t va, uma_slab_t slab)
 void *uma_small_alloc(uma_zone_t zone, vm_size_t bytes, uint8_t *pflag,
     int wait);
 void uma_small_free(void *mem, vm_size_t size, uint8_t flags);
+
+/* Set a global soft limit on UMA managed memory. */
+void uma_set_limit(unsigned long limit);
+unsigned long uma_limit(void);
+
+/* Return the amount of memory managed by UMA. */
+unsigned long uma_size(void);
 #endif /* _KERNEL */
 
 #endif /* VM_UMA_INT_H */

From owner-svn-src-user@freebsd.org  Thu Nov 16 10:47:22 2017
Return-Path: <owner-svn-src-user@freebsd.org>
Delivered-To: svn-src-user@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 85C43DD8E09
 for <svn-src-user@mailman.ysv.freebsd.org>;
 Thu, 16 Nov 2017 10:47:22 +0000 (UTC) (envelope-from pho@FreeBSD.org)
Received: from repo.freebsd.org (repo.freebsd.org
 [IPv6:2610:1c1:1:6068::e6a:0])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 50A9975B3C;
 Thu, 16 Nov 2017 10:47:22 +0000 (UTC) (envelope-from pho@FreeBSD.org)
Received: from repo.freebsd.org ([127.0.1.37])
 by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id vAGAlLDH012855;
 Thu, 16 Nov 2017 10:47:21 GMT (envelope-from pho@FreeBSD.org)
Received: (from pho@localhost)
 by repo.freebsd.org (8.15.2/8.15.2/Submit) id vAGAlL5H012854;
 Thu, 16 Nov 2017 10:47:21 GMT (envelope-from pho@FreeBSD.org)
Message-Id: <201711161047.vAGAlL5H012854@repo.freebsd.org>
X-Authentication-Warning: repo.freebsd.org: pho set sender to pho@FreeBSD.org
 using -f
From: Peter Holm <pho@FreeBSD.org>
Date: Thu, 16 Nov 2017 10:47:21 +0000 (UTC)
To: src-committers@freebsd.org, svn-src-user@freebsd.org
Subject: svn commit: r325889 - user/pho/stress2/misc
X-SVN-Group: user
X-SVN-Commit-Author: pho
X-SVN-Commit-Paths: user/pho/stress2/misc
X-SVN-Commit-Revision: 325889
X-SVN-Commit-Repository: base
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-BeenThere: svn-src-user@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: "SVN commit messages for the experimental &quot; user&quot;
 src tree" <svn-src-user.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-user/>
List-Post: <mailto:svn-src-user@freebsd.org>
List-Help: <mailto:svn-src-user-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Nov 2017 10:47:22 -0000

Author: pho
Date: Thu Nov 16 10:47:21 2017
New Revision: 325889
URL: https://svnweb.freebsd.org/changeset/base/325889

Log:
  Fix misunderstandings.
  
  Sponsored by:	Dell EMC Isilon

Modified:
  user/pho/stress2/misc/stack_guard_page.sh

Modified: user/pho/stress2/misc/stack_guard_page.sh
==============================================================================
--- user/pho/stress2/misc/stack_guard_page.sh	Thu Nov 16 10:15:17 2017	(r325888)
+++ user/pho/stress2/misc/stack_guard_page.sh	Thu Nov 16 10:47:21 2017	(r325889)
@@ -28,9 +28,8 @@
 # $FreeBSD$
 #
 
-# Setting a negative guard page size will cause "Abort trap"
-# Reported by Shawn Webb <shawn.webb@hardenedbsd.org>
-# Fixed in r320560.
+# Test with stack_guard_page set between 1 and 512.
+# A negative value is considered invalid.
 
 [ `sysctl -n security.bsd.stack_guard_page` -eq 0 ] && exit 0
 
@@ -41,7 +40,7 @@ trap "sysctl security.bsd.stack_guard_page=$old" EXIT 
 
 start=`date +%s`
 while [ $((`date +%s` - start)) -lt 60 ]; do
-	sysctl security.bsd.stack_guard_page=`jot -r 1 -1 512` > \
+	sysctl security.bsd.stack_guard_page=`jot -r 1 1 512` > \
 	    /dev/null 2>&1
 	sleep 1
 done