From owner-freebsd-arch@FreeBSD.ORG Sat Jul 31 21:39:50 2010 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 721D21065670; Sat, 31 Jul 2010 21:39:50 +0000 (UTC) (envelope-from alc@cs.rice.edu) Received: from mail.cs.rice.edu (mail.cs.rice.edu [128.42.1.31]) by mx1.freebsd.org (Postfix) with ESMTP id 3BE2C8FC1E; Sat, 31 Jul 2010 21:39:49 +0000 (UTC) Received: from mail.cs.rice.edu (localhost.localdomain [127.0.0.1]) by mail.cs.rice.edu (Postfix) with ESMTP id 825672C2B32; Sat, 31 Jul 2010 16:39:49 -0500 (CDT) X-Virus-Scanned: by amavis-2.4.0 at mail.cs.rice.edu Received: from mail.cs.rice.edu ([127.0.0.1]) by mail.cs.rice.edu (mail.cs.rice.edu [127.0.0.1]) (amavisd-new, port 10024) with LMTP id bt9g5GN86489; Sat, 31 Jul 2010 16:39:41 -0500 (CDT) Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.cs.rice.edu (Postfix) with ESMTP id 579342C2ACA; Sat, 31 Jul 2010 16:39:41 -0500 (CDT) Message-ID: <4C54981B.9080209@cs.rice.edu> Date: Sat, 31 Jul 2010 16:39:39 -0500 From: Alan Cox User-Agent: Thunderbird 2.0.0.24 (X11/20100501) MIME-Version: 1.0 To: John Baldwin References: <4C4DB2B8.9080404@freebsd.org> <201007270935.52082.jhb@freebsd.org> <4C531ED7.9010601@cs.rice.edu> <201007301614.40768.jhb@freebsd.org> In-Reply-To: <201007301614.40768.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: alc@freebsd.org, freebsd-arch@freebsd.org Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Jul 2010 21:39:50 -0000 John Baldwin wrote: > On Friday, July 30, 2010 2:49:59 pm Alan Cox wrote: > >> John Baldwin wrote: >> >>> I have a strawman of that (relative to 7). It simply adjusts the hardcoded >>> maximum to instead be a function of the amount of physical memory. >>> >>> >>> >> Unless I'm misreading this patch, it would allow "desiredvnodes" to grow >> (slowly) on i386/PAE starting at 5GB of RAM until we reach the (too >> high) "virt" limit of about 329,000. Yes? For example, an 8GB i386/PAE >> machine would have 60% more vnodes than was allowed by MAXVNODE_MAX, and >> it would not stop there. I think that we should be concerned about >> that, because MAXVNODE_MAX came about because the "virt" limit wasn't >> working. >> > > Agreed. > > >> As the numbers above show, we could more than halve the growth rate for >> "virt" and it would have no effect on either amd64 or i386 machines with >> up to 1.5GB of RAM. They would have just as many vnodes. Then, with >> that slower growth rate, we could simply eliminate MAXVNODES_MAX (or at >> least configure it to some absurdly large value), thereby relieving the >> fixed cap on amd64, where it isn't needed. >> >> With that in mind, the following patch slows the growth of "virt" from >> 2/5 of vm_kmem_size to 1/7. This has no effect on amd64. However, on >> i386. it allows desiredvnodes to grow slowly for machines with 1.5GB to >> about 2.5GB of RAM, ultimately exceeding the old desiredvnodes cap by >> about 17%. Once we exceed the old cap, we increase desiredvnodes at a >> marginal rate that is almost the same as your patch, about 1% of >> physical memory. It's just computed differently. >> >> Using 1/8 instead of 1/7, amd64 machines with less than about 1.5GB lose >> about 7% of their vnodes, but they catch up and pass the old limit by >> 1.625GB. Perhaps, more importantly, i386 machines only exceed the old >> cap by 3%. >> >> Thoughts? >> > > I think this is much better. My strawman was rather hackish in that it was > layering a hack on top of the existing calculations. I prefer your approach. > I do not think penalizing amd64 machines with less than 1.5GB is a big worry > as most x86 machines with a small amount of memory are probably running as > i386 anyway. Given that, I would probably lean towards 1/8 instead of 1/7, > but I would be happy with either one. > > I've looked a bit at an i386/PAE system with 8GB. I don't think that a default configuration, e.g., no changes to the mbuf limits, is at risk with 1/7. >> Index: kern/vfs_subr.c >> =================================================================== >> --- kern/vfs_subr.c (revision 210504) >> +++ kern/vfs_subr.c (working copy) >> @@ -284,21 +284,29 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLA >> * Initialize the vnode management data structures. >> */ >> #ifndef MAXVNODES_MAX >> -#define MAXVNODES_MAX 100000 >> +#define MAXVNODES_MAX 8388608 /* Reevaluate when physmem >> exceeds 512GB. */ >> #endif >> > > How is this value computed? I would prefer something like: > > '512 * 1024 * 1024 * 1024 / (sizeof(struct vnode) + sizeof(struct vm_object) / N' > > if that is how it is computed. A brief note about the magic number of 393216 > would also be nice to have (and if it could be a constant with a similar > formula value that would be nice, too.). > > I've tried to explain this computation below. Index: kern/vfs_subr.c =================================================================== --- kern/vfs_subr.c (revision 210702) +++ kern/vfs_subr.c (working copy) @@ -282,23 +282,34 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLA /* * Initialize the vnode management data structures. + * + * Reevaluate the following cap on the number of vnodes after the physical + * memory size exceeds 512GB. In the limit, as the physical memory size + * grows, the ratio of physical pages to vnodes approaches sixteen to one. */ #ifndef MAXVNODES_MAX -#define MAXVNODES_MAX 100000 +#define MAXVNODES_MAX (512 * (1024 * 1024 * 1024 / PAGE_SIZE / 16)) #endif static void vntblinit(void *dummy __unused) { + int physvnodes, virtvnodes; /* - * Desiredvnodes is a function of the physical memory size and - * the kernel's heap size. Specifically, desiredvnodes scales - * in proportion to the physical memory size until two fifths - * of the kernel's heap size is consumed by vnodes and vm - * objects. + * Desiredvnodes is a function of the physical memory size and the + * kernel's heap size. Generally speaking, it scales with the + * physical memory size. The ratio of desiredvnodes to physical pages + * is one to four until desiredvnodes exceeds 98,304. Thereafter, the + * marginal ratio of desiredvnodes to physical pages is one to + * sixteen. However, desiredvnodes is limited by the kernel's heap + * size. The memory required by desiredvnodes vnodes and vm objects + * may not exceed one seventh of the kernel's heap size. */ - desiredvnodes = min(maxproc + cnt.v_page_count / 4, 2 * vm_kmem_size / - (5 * (sizeof(struct vm_object) + sizeof(struct vnode)))); + physvnodes = maxproc + cnt.v_page_count / 16 + 3 * min(98304 * 4, + cnt.v_page_count) / 16; + virtvnodes = vm_kmem_size / (7 * (sizeof(struct vm_object) + + sizeof(struct vnode))); + desiredvnodes = min(physvnodes, virtvnodes); if (desiredvnodes > MAXVNODES_MAX) { if (bootverbose) printf("Reducing kern.maxvnodes %d -> %d\n",