From owner-svn-src-stable@freebsd.org  Wed Jun 21 14:39:32 2017
Return-Path: <owner-svn-src-stable@freebsd.org>
Delivered-To: svn-src-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7E5B0D9108B;
 Wed, 21 Jun 2017 14:39:32 +0000 (UTC) (envelope-from jhb@FreeBSD.org)
Received: from repo.freebsd.org (repo.freebsd.org
 [IPv6:2610:1c1:1:6068::e6a:0])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3B91C6F371;
 Wed, 21 Jun 2017 14:39:32 +0000 (UTC) (envelope-from jhb@FreeBSD.org)
Received: from repo.freebsd.org ([127.0.1.37])
 by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id v5LEdVdb070592;
 Wed, 21 Jun 2017 14:39:31 GMT (envelope-from jhb@FreeBSD.org)
Received: (from jhb@localhost)
 by repo.freebsd.org (8.15.2/8.15.2/Submit) id v5LEdVYq070591;
 Wed, 21 Jun 2017 14:39:31 GMT (envelope-from jhb@FreeBSD.org)
Message-Id: <201706211439.v5LEdVYq070591@repo.freebsd.org>
X-Authentication-Warning: repo.freebsd.org: jhb set sender to jhb@FreeBSD.org
 using -f
From: John Baldwin <jhb@FreeBSD.org>
Date: Wed, 21 Jun 2017 14:39:31 +0000 (UTC)
To: src-committers@freebsd.org, svn-src-all@freebsd.org,
 svn-src-stable@freebsd.org, svn-src-stable-10@freebsd.org
Subject: svn commit: r320190 - stable/10/sys/vm
X-SVN-Group: stable-10
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-BeenThere: svn-src-stable@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: SVN commit messages for all the -stable branches of the src tree
 <svn-src-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-stable>, 
 <mailto:svn-src-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-stable/>
List-Post: <mailto:svn-src-stable@freebsd.org>
List-Help: <mailto:svn-src-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-stable>,
 <mailto:svn-src-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Jun 2017 14:39:32 -0000

Author: jhb
Date: Wed Jun 21 14:39:31 2017
New Revision: 320190
URL: https://svnweb.freebsd.org/changeset/base/320190

Log:
  MFC 313186, 319702: Account for overhead of page structures when sizing page array.
  
  313186:
  
  Over the years, the code and comments in vm_page_startup() have diverged in
  one respect.  When determining how many page structures to allocate,
  contrary to what the comments say, the code does not account for the
  overhead of a page structure per page of physical memory.  This revision
  changes the code to match the comments.
  
  319702:
  
  Fix an off-by-one error in the VM page array on some systems.
  
  r313186 changed how the size of the VM page array was calculated to be
  less wasteful.  For most systems, the amount of memory is divided by
  the overhead required by each page (a page of data plus a struct vm_page)
  to determine the maximum number of available pages.  However, if the
  remainder for the first non-available page was at least a page of data
  (so that the only memory missing was a struct vm_page), this last page
  was left in phys_avail[] but was not allocated an entry in the VM page
  array.  Handle this case by explicitly excluding the page from
  phys_avail[].
  
  Requested by:	alc

Modified:
  stable/10/sys/vm/vm_page.c
Directory Properties:
  stable/10/   (props changed)

Modified: stable/10/sys/vm/vm_page.c
==============================================================================
--- stable/10/sys/vm/vm_page.c	Wed Jun 21 14:38:52 2017	(r320189)
+++ stable/10/sys/vm/vm_page.c	Wed Jun 21 14:39:31 2017	(r320190)
@@ -275,17 +275,16 @@ vm_page_domain_init(struct vm_domain *vmd)
 /*
  *	vm_page_startup:
  *
- *	Initializes the resident memory module.
- *
- *	Allocates memory for the page cells, and
- *	for the object/offset-to-page hash table headers.
- *	Each page cell is initialized and placed on the free list.
+ *	Initializes the resident memory module.  Allocates physical memory for
+ *	bootstrapping UMA and some data structures that are used to manage
+ *	physical pages.  Initializes these structures, and populates the free
+ *	page queues.
  */
 vm_offset_t
 vm_page_startup(vm_offset_t vaddr)
 {
 	vm_offset_t mapped;
-	vm_paddr_t page_range;
+	vm_paddr_t high_avail, low_avail, page_range, size;
 	vm_paddr_t new_end;
 	int i;
 	vm_paddr_t pa;
@@ -295,7 +294,6 @@ vm_page_startup(vm_offset_t vaddr)
 	/* the biggest memory array is the second group of pages */
 	vm_paddr_t end;
 	vm_paddr_t biggestsize;
-	vm_paddr_t low_water, high_water;
 	int biggestone;
 
 	biggestsize = 0;
@@ -315,26 +313,12 @@ vm_page_startup(vm_offset_t vaddr)
 	vm_phys_add_seg(0, phys_avail[0]);
 #endif
 
-	low_water = phys_avail[0];
-	high_water = phys_avail[1];
-
-	for (i = 0; i < vm_phys_nsegs; i++) {
-		if (vm_phys_segs[i].start < low_water)
-			low_water = vm_phys_segs[i].start;
-		if (vm_phys_segs[i].end > high_water)
-			high_water = vm_phys_segs[i].end;
-	}
 	for (i = 0; phys_avail[i + 1]; i += 2) {
-		vm_paddr_t size = phys_avail[i + 1] - phys_avail[i];
-
+		size = phys_avail[i + 1] - phys_avail[i];
 		if (size > biggestsize) {
 			biggestone = i;
 			biggestsize = size;
 		}
-		if (phys_avail[i] < low_water)
-			low_water = phys_avail[i];
-		if (phys_avail[i + 1] > high_water)
-			high_water = phys_avail[i + 1];
 	}
 
 	end = phys_avail[biggestone+1];
@@ -383,6 +367,16 @@ vm_page_startup(vm_offset_t vaddr)
 	    new_end + vm_page_dump_size, VM_PROT_READ | VM_PROT_WRITE);
 	bzero((void *)vm_page_dump, vm_page_dump_size);
 #endif
+#if defined(__amd64__) || defined(__mips__)
+	/*
+	 * Include the UMA bootstrap pages and vm_page_dump in a crash dump.
+	 * When pmap_map() uses the direct map, they are not automatically 
+	 * included.
+	 */
+	for (pa = new_end; pa < end; pa += PAGE_SIZE)
+		dump_add_page(pa);
+#endif
+	phys_avail[biggestone + 1] = new_end;
 #ifdef __amd64__
 	/*
 	 * Request that the physical pages underlying the message buffer be
@@ -398,33 +392,80 @@ vm_page_startup(vm_offset_t vaddr)
 #endif
 	/*
 	 * Compute the number of pages of memory that will be available for
-	 * use (taking into account the overhead of a page structure per
-	 * page).
+	 * use, taking into account the overhead of a page structure per page.
+	 * In other words, solve
+	 *	"available physical memory" - round_page(page_range *
+	 *	    sizeof(struct vm_page)) = page_range * PAGE_SIZE 
+	 * for page_range.  
 	 */
-	first_page = low_water / PAGE_SIZE;
-#ifdef VM_PHYSSEG_SPARSE
-	page_range = 0;
+	low_avail = phys_avail[0];
+	high_avail = phys_avail[1];
 	for (i = 0; i < vm_phys_nsegs; i++) {
-		page_range += atop(vm_phys_segs[i].end -
-		    vm_phys_segs[i].start);
+		if (vm_phys_segs[i].start < low_avail)
+			low_avail = vm_phys_segs[i].start;
+		if (vm_phys_segs[i].end > high_avail)
+			high_avail = vm_phys_segs[i].end;
 	}
+	/* Skip the first chunk.  It is already accounted for. */
+	for (i = 2; phys_avail[i + 1] != 0; i += 2) {
+		if (phys_avail[i] < low_avail)
+			low_avail = phys_avail[i];
+		if (phys_avail[i + 1] > high_avail)
+			high_avail = phys_avail[i + 1];
+	}
+	first_page = low_avail / PAGE_SIZE;
+#ifdef VM_PHYSSEG_SPARSE
+	size = 0;
+	for (i = 0; i < vm_phys_nsegs; i++)
+		size += vm_phys_segs[i].end - vm_phys_segs[i].start;
 	for (i = 0; phys_avail[i + 1] != 0; i += 2)
-		page_range += atop(phys_avail[i + 1] - phys_avail[i]);
+		size += phys_avail[i + 1] - phys_avail[i];
 #elif defined(VM_PHYSSEG_DENSE)
-	page_range = high_water / PAGE_SIZE - first_page;
+	size = high_avail - low_avail;
 #else
 #error "Either VM_PHYSSEG_DENSE or VM_PHYSSEG_SPARSE must be defined."
 #endif
+
+#ifdef VM_PHYSSEG_DENSE
+	/*
+	 * In the VM_PHYSSEG_DENSE case, the number of pages can account for
+	 * the overhead of a page structure per page only if vm_page_array is
+	 * allocated from the last physical memory chunk.  Otherwise, we must
+	 * allocate page structures representing the physical memory
+	 * underlying vm_page_array, even though they will not be used.
+	 */
+	if (new_end != high_avail)
+		page_range = size / PAGE_SIZE;
+	else
+#endif
+	{
+		page_range = size / (PAGE_SIZE + sizeof(struct vm_page));
+
+		/*
+		 * If the partial bytes remaining are large enough for
+		 * a page (PAGE_SIZE) without a corresponding
+		 * 'struct vm_page', then new_end will contain an
+		 * extra page after subtracting the length of the VM
+		 * page array.  Compensate by subtracting an extra
+		 * page from new_end.
+		 */
+		if (size % (PAGE_SIZE + sizeof(struct vm_page)) >= PAGE_SIZE) {
+			if (new_end == high_avail)
+				high_avail -= PAGE_SIZE;
+			new_end -= PAGE_SIZE;
+		}
+	}
 	end = new_end;
 
 	/*
 	 * Reserve an unmapped guard page to trap access to vm_page_array[-1].
+	 * However, because this page is allocated from KVM, out-of-bounds
+	 * accesses using the direct map will not be trapped.
 	 */
 	vaddr += PAGE_SIZE;
 
 	/*
-	 * Initialize the mem entry structures now, and put them in the free
-	 * queue.
+	 * Allocate physical memory for the page structures, and map it.
 	 */
 	new_end = trunc_page(end - page_range * sizeof(struct vm_page));
 	mapped = pmap_map(&vaddr, new_end, end,
@@ -432,19 +473,18 @@ vm_page_startup(vm_offset_t vaddr)
 	vm_page_array = (vm_page_t) mapped;
 #if VM_NRESERVLEVEL > 0
 	/*
-	 * Allocate memory for the reservation management system's data
-	 * structures.
+	 * Allocate physical memory for the reservation management system's
+	 * data structures, and map it.
 	 */
-	new_end = vm_reserv_startup(&vaddr, new_end, high_water);
+	if (high_avail == end)
+		high_avail = new_end;
+	new_end = vm_reserv_startup(&vaddr, new_end, high_avail);
 #endif
 #if defined(__amd64__) || defined(__mips__)
 	/*
-	 * pmap_map on amd64 and mips can come out of the direct-map, not kvm
-	 * like i386, so the pages must be tracked for a crashdump to include
-	 * this data.  This includes the vm_page_array and the early UMA
-	 * bootstrap pages.
+	 * Include vm_page_array and vm_reserv_array in a crash dump.
 	 */
-	for (pa = new_end; pa < phys_avail[biggestone + 1]; pa += PAGE_SIZE)
+	for (pa = new_end; pa < end; pa += PAGE_SIZE)
 		dump_add_page(pa);
 #endif	
 	phys_avail[biggestone + 1] = new_end;