Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 30 Jul 2010 13:49:59 -0500
From:      Alan Cox <alc@cs.rice.edu>
To:        John Baldwin <jhb@freebsd.org>
Cc:        alc@freebsd.org, Matthew Fleming <mdf356@gmail.com>, Andriy Gapon <avg@freebsd.org>, freebsd-arch@freebsd.org
Subject:   Re: amd64: change VM_KMEM_SIZE_SCALE to 1?
Message-ID:  <4C531ED7.9010601@cs.rice.edu>
In-Reply-To: <201007270935.52082.jhb@freebsd.org>
References:  <4C4DB2B8.9080404@freebsd.org> <4C4DD1AA.3050906@freebsd.org> <AANLkTimWcXHAz=K1UM6ECa=6xR5KuS-sf_nDhbFEgehq@mail.gmail.com> <201007270935.52082.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
John Baldwin wrote:
> On Monday, July 26, 2010 3:30:59 pm Alan Cox wrote:
>   
>> As far as eliminating or reducing the manual tuning that many ZFS users do,
>> I would love to see someone tackle the overly conservative hard limit that
>> we place on the number of vnode structures.  The current hard limit was put
>> in place when we had just introduced mutexes into many structures and more a
>> mutex was much larger than it is today.
>>     
>
>   

I took a look at the history of the "desiredvnodes" computation.  Prior 
to r115266, in May of 2003, the computation was based on physical memory 
and there was no MAXVNODES_MAX limit.  It was simply:

desiredvnodes = maxproc + cnt.v_page_count / 4;

r115266 introduced the min() that also took into account the virtual 
address space limit on the heap.  As I recall, it was to stop "kmem_map 
too small" panics.  In fact, I was asked to make this change by re@.

Finally, in August 2004, r133038, introduced MAXVNODES_MAX.  The commit 
message doesn't say, but I think the motivation was again to stop 
"kmem_map too small" panics.  In effect, the virtual address space limit 
introduced by r115266 wasn't working.

Enough history, here are some data points for the "desiredvnodes" 
computation on amd64 and i386 above and below the point where 
MAXVNODES_MAX has an effect.  "phys" is the number of vnodes that would 
be allowed based upon physical memory size, and "virt" is the number of 
vnodes that would be allowed based upon virtual memory size.

amd64:

2GB

phys: 132668
virt: 397057

1.5GB
phys: 100862
virt: 297228

1GB
phys: 69056
virt: 197398

512MB
phys: 35106
virt: 97569
 
i386:

2GB

phys: 134106
virt: 328965

1.5GB

phys: 101916
virt: 328965

1GB

phys: 69725
virt: 328965

512MB

phys: 35576
virt: 168875

For both architectures, the "phys" limit is the limiting factor until we 
reach about 1.5GB of physical memory.  MAXVNODES_MAX is only a factor 
machines on machines with more than 1.5GB of RAM.  So, whatever change 
we might make to MAXVNODES_MAX shouldn't affect the small embedded 
systems that are running FreeBSD.

Even though "virt" is never a factor on amd64, it's worth noticing that 
in both absolute and relative terms "virt" grows faster than "phys".  On 
i386, "virt" starts out larger than on amd64 because a vnode and a 
vm_object are smaller relative to vm_kmem_size, but "virt" reaches its 
maximum by 1GB of RAM because vm_kmem_size has already reached its 
maximum by then.  Nonetheless, even on i386, "virt" is never a factor.  
(For what it's worth, if I extrapolate, an i386/PAE machine with greater 
than 5GB of RAM will have a larger "phys" than "virt".)

> I have a strawman of that (relative to 7).  It simply adjusts the hardcoded 
> maximum to instead be a function of the amount of physical memory.
>
>   

Unless I'm misreading this patch, it would allow "desiredvnodes" to grow 
(slowly) on i386/PAE starting at 5GB of RAM until we reach the (too 
high) "virt" limit of about 329,000.  Yes?  For example, an 8GB i386/PAE 
machine would have 60% more vnodes than was allowed by MAXVNODE_MAX, and 
it would not stop there.  I think that we should be concerned about 
that, because MAXVNODE_MAX came about because the "virt" limit wasn't 
working.

As the numbers above show, we could more than halve the growth rate for 
"virt" and it would have no effect on either amd64 or i386 machines with 
up to 1.5GB of RAM.  They would have just as many vnodes.  Then, with 
that slower growth rate, we could simply eliminate MAXVNODES_MAX (or at 
least configure it to some absurdly large value), thereby relieving the 
fixed cap on amd64, where it isn't needed.

With that in mind, the following patch slows the growth of "virt" from 
2/5 of vm_kmem_size to 1/7.  This has no effect on amd64.  However, on 
i386. it allows desiredvnodes to grow slowly for machines with 1.5GB to 
about 2.5GB of RAM, ultimately exceeding the old desiredvnodes cap by 
about 17%.  Once we exceed the old cap, we increase desiredvnodes at a 
marginal rate that is almost the same as your patch, about 1% of 
physical memory.  It's just computed differently.

Using 1/8 instead of 1/7, amd64 machines with less than about 1.5GB lose 
about 7% of their vnodes, but they catch up and pass the old limit by 
1.625GB.  Perhaps, more importantly, i386 machines only exceed the old 
cap by 3%.

Thoughts?

Index: kern/vfs_subr.c
===================================================================
--- kern/vfs_subr.c     (revision 210504)
+++ kern/vfs_subr.c     (working copy)
@@ -284,21 +284,29 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLA
  * Initialize the vnode management data structures.
  */
 #ifndef        MAXVNODES_MAX
-#define        MAXVNODES_MAX   100000
+#define        MAXVNODES_MAX   8388608 /* Reevaluate when physmem 
exceeds 512GB. */
 #endif
 static void
 vntblinit(void *dummy __unused)
 {
+       int physvnodes, virtvnodes;
 
        /*
-        * Desiredvnodes is a function of the physical memory size and
-        * the kernel's heap size.  Specifically, desiredvnodes scales
-        * in proportion to the physical memory size until two fifths
-        * of the kernel's heap size is consumed by vnodes and vm
-        * objects.
+        * Desiredvnodes is a function of the physical memory size and the
+        * kernel's heap size.  Generally speaking, it scales with the
+        * physical memory size.  The ratio of desiredvnodes to physical 
pages
+        * is one to four until desiredvnodes exceeds 96K.  Thereafter, the
+        * marginal ratio of desiredvnodes to physical pages is one to 
sixteen.
+        * However, desiredvnodes is limited by the kernel's heap size.  The
+        * memory required by desiredvnodes vnodes and vm objects may not
+        * exceed one seventh of the kernel's heap size.
         */
-       desiredvnodes = min(maxproc + cnt.v_page_count / 4, 2 * 
vm_kmem_size /
-           (5 * (sizeof(struct vm_object) + sizeof(struct vnode))));
+       physvnodes = maxproc + cnt.v_page_count / 16 + 3 * min(393216,
+           cnt.v_page_count) / 16;
+       virtvnodes = vm_kmem_size / (7 * (sizeof(struct vm_object) +
+           sizeof(struct vnode)));
+       printf("physvnodes = %d\nvirtvnodes = %d\n", physvnodes, 
virtvnodes);
+       desiredvnodes = min(physvnodes, virtvnodes);
        if (desiredvnodes > MAXVNODES_MAX) {
                if (bootverbose)
                        printf("Reducing kern.maxvnodes %d -> %d\n",


> Index: vfs_subr.c
> ===================================================================
> --- vfs_subr.c	(revision 210934)
> +++ vfs_subr.c	(working copy)
> @@ -288,6 +288,7 @@
>  static void
>  vntblinit(void *dummy __unused)
>  {
> +	int vnodes;
>  
>  	/*
>  	 * Desiredvnodes is a function of the physical memory size and
> @@ -299,10 +300,19 @@
>  	desiredvnodes = min(maxproc + cnt.v_page_count / 4, 2 * vm_kmem_size /
>  	    (5 * (sizeof(struct vm_object) + sizeof(struct vnode))));
>  	if (desiredvnodes > MAXVNODES_MAX) {
> +
> +		/*
> +		 * If there is a lot of physical memory, allow the cap
> +		 * on vnodes to expand to using a little under 1% of
> +		 * available RAM.
> +		 */
> +		vnodes = max(MAXVNODES_MAX, cnt.v_page_count * (PAGE_SIZE /
> +		    128) / (sizeof(struct vm_object) + sizeof(struct vnode)));
> +		KASSERT(vnodes < desiredvnodes, ("capped vnodes too big"));
>  		if (bootverbose)
>  			printf("Reducing kern.maxvnodes %d -> %d\n",
> -			    desiredvnodes, MAXVNODES_MAX);
> -		desiredvnodes = MAXVNODES_MAX;
> +			    desiredvnodes, vnodes);
> +		desiredvnodes = vnodes;
>  	}
>  	wantfreevnodes = desiredvnodes / 4;
>  	mtx_init(&mntid_mtx, "mntid", NULL, MTX_DEF);
>
>   




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C531ED7.9010601>