From owner-freebsd-arch@FreeBSD.ORG Fri Jul 30 19:20:17 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 530A0106566B; Fri, 30 Jul 2010 19:20:17 +0000 (UTC) (envelope-from alc@cs.rice.edu) Received: from mail.cs.rice.edu (mail.cs.rice.edu [128.42.1.31]) by mx1.freebsd.org (Postfix) with ESMTP id 28D0A8FC0A; Fri, 30 Jul 2010 19:20:17 +0000 (UTC) Received: from mail.cs.rice.edu (localhost.localdomain [127.0.0.1]) by mail.cs.rice.edu (Postfix) with ESMTP id A7F142C2ACE; Fri, 30 Jul 2010 13:50:09 -0500 (CDT) X-Virus-Scanned: by amavis-2.4.0 at mail.cs.rice.edu Received: from mail.cs.rice.edu ([127.0.0.1]) by mail.cs.rice.edu (mail.cs.rice.edu [127.0.0.1]) (amavisd-new, port 10024) with LMTP id RJ6E4GYb6DLF; Fri, 30 Jul 2010 13:50:02 -0500 (CDT) Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.cs.rice.edu (Postfix) with ESMTP id 9F7332C2B32; Fri, 30 Jul 2010 13:50:00 -0500 (CDT) Message-ID: <4C531ED7.9010601@cs.rice.edu> Date: Fri, 30 Jul 2010 13:49:59 -0500 From: Alan Cox User-Agent: Thunderbird 2.0.0.24 (X11/20100501) MIME-Version: 1.0 To: John Baldwin References: <4C4DB2B8.9080404@freebsd.org> <4C4DD1AA.3050906@freebsd.org> <201007270935.52082.jhb@freebsd.org> In-Reply-To: <201007270935.52082.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: alc@freebsd.org, Matthew Fleming , Andriy Gapon , freebsd-arch@freebsd.org Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jul 2010 19:20:17 -0000 John Baldwin wrote: > On Monday, July 26, 2010 3:30:59 pm Alan Cox wrote: > >> As far as eliminating or reducing the manual tuning that many ZFS users do, >> I would love to see someone tackle the overly conservative hard limit that >> we place on the number of vnode structures. The current hard limit was put >> in place when we had just introduced mutexes into many structures and more a >> mutex was much larger than it is today. >> > > I took a look at the history of the "desiredvnodes" computation. Prior to r115266, in May of 2003, the computation was based on physical memory and there was no MAXVNODES_MAX limit. It was simply: desiredvnodes = maxproc + cnt.v_page_count / 4; r115266 introduced the min() that also took into account the virtual address space limit on the heap. As I recall, it was to stop "kmem_map too small" panics. In fact, I was asked to make this change by re@. Finally, in August 2004, r133038, introduced MAXVNODES_MAX. The commit message doesn't say, but I think the motivation was again to stop "kmem_map too small" panics. In effect, the virtual address space limit introduced by r115266 wasn't working. Enough history, here are some data points for the "desiredvnodes" computation on amd64 and i386 above and below the point where MAXVNODES_MAX has an effect. "phys" is the number of vnodes that would be allowed based upon physical memory size, and "virt" is the number of vnodes that would be allowed based upon virtual memory size. amd64: 2GB phys: 132668 virt: 397057 1.5GB phys: 100862 virt: 297228 1GB phys: 69056 virt: 197398 512MB phys: 35106 virt: 97569 i386: 2GB phys: 134106 virt: 328965 1.5GB phys: 101916 virt: 328965 1GB phys: 69725 virt: 328965 512MB phys: 35576 virt: 168875 For both architectures, the "phys" limit is the limiting factor until we reach about 1.5GB of physical memory. MAXVNODES_MAX is only a factor machines on machines with more than 1.5GB of RAM. So, whatever change we might make to MAXVNODES_MAX shouldn't affect the small embedded systems that are running FreeBSD. Even though "virt" is never a factor on amd64, it's worth noticing that in both absolute and relative terms "virt" grows faster than "phys". On i386, "virt" starts out larger than on amd64 because a vnode and a vm_object are smaller relative to vm_kmem_size, but "virt" reaches its maximum by 1GB of RAM because vm_kmem_size has already reached its maximum by then. Nonetheless, even on i386, "virt" is never a factor. (For what it's worth, if I extrapolate, an i386/PAE machine with greater than 5GB of RAM will have a larger "phys" than "virt".) > I have a strawman of that (relative to 7). It simply adjusts the hardcoded > maximum to instead be a function of the amount of physical memory. > > Unless I'm misreading this patch, it would allow "desiredvnodes" to grow (slowly) on i386/PAE starting at 5GB of RAM until we reach the (too high) "virt" limit of about 329,000. Yes? For example, an 8GB i386/PAE machine would have 60% more vnodes than was allowed by MAXVNODE_MAX, and it would not stop there. I think that we should be concerned about that, because MAXVNODE_MAX came about because the "virt" limit wasn't working. As the numbers above show, we could more than halve the growth rate for "virt" and it would have no effect on either amd64 or i386 machines with up to 1.5GB of RAM. They would have just as many vnodes. Then, with that slower growth rate, we could simply eliminate MAXVNODES_MAX (or at least configure it to some absurdly large value), thereby relieving the fixed cap on amd64, where it isn't needed. With that in mind, the following patch slows the growth of "virt" from 2/5 of vm_kmem_size to 1/7. This has no effect on amd64. However, on i386. it allows desiredvnodes to grow slowly for machines with 1.5GB to about 2.5GB of RAM, ultimately exceeding the old desiredvnodes cap by about 17%. Once we exceed the old cap, we increase desiredvnodes at a marginal rate that is almost the same as your patch, about 1% of physical memory. It's just computed differently. Using 1/8 instead of 1/7, amd64 machines with less than about 1.5GB lose about 7% of their vnodes, but they catch up and pass the old limit by 1.625GB. Perhaps, more importantly, i386 machines only exceed the old cap by 3%. Thoughts? Index: kern/vfs_subr.c =================================================================== --- kern/vfs_subr.c (revision 210504) +++ kern/vfs_subr.c (working copy) @@ -284,21 +284,29 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLA * Initialize the vnode management data structures. */ #ifndef MAXVNODES_MAX -#define MAXVNODES_MAX 100000 +#define MAXVNODES_MAX 8388608 /* Reevaluate when physmem exceeds 512GB. */ #endif static void vntblinit(void *dummy __unused) { + int physvnodes, virtvnodes; /* - * Desiredvnodes is a function of the physical memory size and - * the kernel's heap size. Specifically, desiredvnodes scales - * in proportion to the physical memory size until two fifths - * of the kernel's heap size is consumed by vnodes and vm - * objects. + * Desiredvnodes is a function of the physical memory size and the + * kernel's heap size. Generally speaking, it scales with the + * physical memory size. The ratio of desiredvnodes to physical pages + * is one to four until desiredvnodes exceeds 96K. Thereafter, the + * marginal ratio of desiredvnodes to physical pages is one to sixteen. + * However, desiredvnodes is limited by the kernel's heap size. The + * memory required by desiredvnodes vnodes and vm objects may not + * exceed one seventh of the kernel's heap size. */ - desiredvnodes = min(maxproc + cnt.v_page_count / 4, 2 * vm_kmem_size / - (5 * (sizeof(struct vm_object) + sizeof(struct vnode)))); + physvnodes = maxproc + cnt.v_page_count / 16 + 3 * min(393216, + cnt.v_page_count) / 16; + virtvnodes = vm_kmem_size / (7 * (sizeof(struct vm_object) + + sizeof(struct vnode))); + printf("physvnodes = %d\nvirtvnodes = %d\n", physvnodes, virtvnodes); + desiredvnodes = min(physvnodes, virtvnodes); if (desiredvnodes > MAXVNODES_MAX) { if (bootverbose) printf("Reducing kern.maxvnodes %d -> %d\n", > Index: vfs_subr.c > =================================================================== > --- vfs_subr.c (revision 210934) > +++ vfs_subr.c (working copy) > @@ -288,6 +288,7 @@ > static void > vntblinit(void *dummy __unused) > { > + int vnodes; > > /* > * Desiredvnodes is a function of the physical memory size and > @@ -299,10 +300,19 @@ > desiredvnodes = min(maxproc + cnt.v_page_count / 4, 2 * vm_kmem_size / > (5 * (sizeof(struct vm_object) + sizeof(struct vnode)))); > if (desiredvnodes > MAXVNODES_MAX) { > + > + /* > + * If there is a lot of physical memory, allow the cap > + * on vnodes to expand to using a little under 1% of > + * available RAM. > + */ > + vnodes = max(MAXVNODES_MAX, cnt.v_page_count * (PAGE_SIZE / > + 128) / (sizeof(struct vm_object) + sizeof(struct vnode))); > + KASSERT(vnodes < desiredvnodes, ("capped vnodes too big")); > if (bootverbose) > printf("Reducing kern.maxvnodes %d -> %d\n", > - desiredvnodes, MAXVNODES_MAX); > - desiredvnodes = MAXVNODES_MAX; > + desiredvnodes, vnodes); > + desiredvnodes = vnodes; > } > wantfreevnodes = desiredvnodes / 4; > mtx_init(&mntid_mtx, "mntid", NULL, MTX_DEF); > >