Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Mar 2018 21:23:08 -0400
From:      Curtis Villamizar <curtis@orleans.occnc.com>
To:        freebsd-stable@freebsd.org
Cc:        Curtis Villamizar <curtis@orleans.occnc.com>
Subject:   Re: kern.maxswzone causing serious problems
Message-ID:  <387a65b7-d221-0a10-b801-1dd573054e10@orleans.occnc.com>
In-Reply-To: <cmu-lmtpd-29464-1522294656-0@mda32.somerville.occnc.com>
References:  <cmu-lmtpd-29464-1522294656-0@mda32.somerville.occnc.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Replying to myself.

Oops - error in the diff:

-+                     (maxpages / SWAP_META_PAGES) * 2
++                     (npages / SWAP_META_PAGES) * 2

The prior code worked but only because it doubled the allowable swap.  
This provides the right value for kern.maxswzone for the amount of swap 
you have for the complaint to go away.  I confirmed that by reducing the 
kern.maxswzone value by a small amount (under 1K) and the complaint 
returned.  I am currently running a regression test over the host and 
the entire set of VMs running on it to makes sure there are no more 
surprises. Since I generate configs it is easy to change the way 
kern.maxswzone is computed when generating boot/loader.conf for a given 
host or VM.

Will check back later after regression testing.  Apparently the best way 
to get this to get some attention is to file a bug so once the changes 
are fully verified in the regression testing I'll do that.  Just in case 
there is some further interaction with using more swap that recommended 
by the current code.

Curtis


On 03/28/18 23:36, Curtis Villamizar wrote:
> I'm starting to upgrade a set of servers from FreeBSD 11.0-STABLE #0
> r308356 (OK, rather old) to FreeBSD 11.1-STABLE #0 r331152 without
> much success.  I'm getting the occasionally discussed kern.maxswzone
> problems but I really do need to configure that swap space.
>
> On an upgraded server I'm getting (line continuation added for
> readability):
>
>     warning: total configured swap (5242880 pages) \
>        exceeds maximum recommended amount (112288 pages).
>     warning: increase kern.maxswzone or reduce amount of swap.
>     warning: total configured swap (10485760 pages) \
>        exceeds maximum recommended amount (112288 pages).
>     warning: increase kern.maxswzone or reduce amount of swap.
>
> The value previously used was not working.  I ended up temporarily
> cutting swap in half as well to get rid of the error.  This is only a
> symptom of a greater problem.
>
> This machine has for a long time run multiple VM which total 17GB
> memory on a server with 8 GB of physical memory.  Normally things are
> fine becuase most of these servers are idle most of the time and all
> but a small memory footprint can be paged out.  I have for years had
> 40 GB swap, with 20GB each on two spindles (at least going back to
> FreeBSD 9) on drives identical partitioning with mirrored partitions.
>
> \begin{talesofwoe}  % ignore if busy
>
> When just trying to install base system software using md devices on
> at a time (no VM running yet) I had been getting "killed: out of swap
> space" messages.  I got rid of this by reducing the size of the VM
> root partitions (most VM run zfs on another partition with not much on
> root so this was OK).  Next problem was running the VM.  I worked
> around this by reducing some VM memory (-m in bhyve) by half.  I can
> get the whole set of VM to boot but now installing additional software
> (which involved just moving files with scp and running tar) doesn't
> work.  At this point I have 9 GB of VM running on 8 GB physical memory
> and still couldn't even install software.  Once the maxswzone message
> went away, so did all of these problems except my VMs now have half as
> much RAM each.  Now each of the VM are reporting swap problems.  And
> this is just upgrading one server to 11.1.
>
> \end{talesofwoe}
>
> This is a major regression for FreeBSD 11.1.  The same value used in
> FreeBSD 11.0 should just work or there should be some documentation on
> how to set this (preferably the error message).  If nothing else, some
> advice on converting a kern.maxswzone value from 11.0 to a working
> value for 11.1 would be nice.  The entry in the loader(8) man page is
> not very helpful.
>
> btw- Reporting swap size in MB or KB in the error message would be
> helpful.  In addition to pages would be fine.  Mentioning what the
> highest value kern.maxswzone could be set to would also be helpful.
> Changing "warning: increase kern.maxswzone" to "warning: increase
> kern.maxswzone to %d" would be very helpful.
>
> \begin{naiveananlysis}
>
> The magic (or mess, depending on perspective) is mostly in "void
> swap_pager_swap_init(void)" in the file vm/swap_pager.c between lines
> 484 and 563 (in current which is same as stable/11 in this function).
> The diffs from known working r308356 to current show a diff at "@@
> -538,21 +518,25 @@" which has swpctrie_zone and swblk_zone computed
> based on maxswzone, then runs uma_zone_reserve_kva based on maxswzone
> and potentially reduces it.
>
> In the older code a "Swap zone entries reduced" message would be
> produced if uma_zone_reserve_kva cut back (which is moved and the
> message changed a bit).  But I didn't see this message so
> uma_zone_reserve_kva ran fine the first time without reducing "n".  In
> the new code swap_maxpages and swzone are then set.
> swapon_check_swzone gives the warning but does nothing as far as I can
> tell other than two printfs.  It doesn't appear to do any harm to have
> too much swap and ignore this warning (you just can't use it).
>
> There is a multiplier by SWAP_META_PAGES which is defined to be
> PCTRIE_COUNT which in sys/pctrie.h is defined as (1 << PCTRIE_WIDTH)
> and PCTRIE_WIDTH is 4 or 3 depending on __LP64__.
>
> swap_pager_swap_init calculates swap_maxpages but swapon_check_swzone
> doesn't use it, calculating local variable maxpages (the same way)
> instead.  Since it seems that npages / SWAP_META_PAGES is related to
> what you'd want to set kern.maxswzone to.  If so, a better set of
> printf might at least give better information.
>
> Note that VM_SWZONE_SIZE_MAX defaults to (276 * 128 * 1024) which
> would seem to be 128K * SWAP_META_PAGES * PAGE_SIZE = 8GB or 16GB
> depending on if __LP64__ is defined, but that is for i386 only.  There
> is no definition of VM_SWZONE_SIZE_MAX for amd64 unless it picks this
> up from i386 which apparently it does.
>
> One problem is the conditional for using maxswzone only allows the
> swzone size to be reduced and not increased.  Those people frobbing
> kern.maxswzone (including me for a time) were hopelessly wasting their
> time.
>
> Based on my math the old max swap was about 4 * available RAM.
>
> Naive or horribly naive?  I haven't tried this yet ... (compiles)
>
> \end{naiveananlysis}
>
> After some playing around I ended up with the diffs below.
>
> On the host with 8 GB RAM and back to 40 GB swap I got (indent and
> line continuation added):
>
>    warning: total configured swap (10485760 pages, 40960 MB) \
>      exceeds maximum recommended amount (8100744 pages, 31643 MB).
>    warning: increase kern.maxswzone from 0 to 275425296 \
>      or reduce amount of swap.
>
> After setting kern.maxswzone to 275425296 no complaint.
>
> Still needs more testing.  Currently reinstalling the full set of VMs.
> Later I'll try this on some of the VMs that have been configured with
> various memory and swap sizes.  Also will revert to the conditions
> that worked fine in prior versions and stopped working with 11.1.
>
> Curtis
>
>
> ps - about the diffs:
>
> On amd64 VM_SWZONE_SIZE_MAX is not defined.  On i386 it is defined
> based on a guess of sizeof(struct swblk).  On amd64 that size is 136
> and the guess on i386 is 276 so I made the guess a #define and have a
> comparison in the code to complain if the guess is off.
>
> The original code lets maxswzone decrease swapzone but not increase
> it, unlike prior code.  I put in a limit to how much it could be
> increased (but not sure even that is legitimate - why not more).  The
> replaced code does a printf on an attempt to set too high and reduces
> the value.
>
> The swapon_check_swzone check gives much more useful information than
> it did before including exactly what to set kern.maxswzone to end up
> with the recommended twice the swzone space.
>
>
> Index: i386/include/param.h
> ===================================================================
> --- i386/include/param.h	(revision 331152)
> +++ i386/include/param.h	(working copy)
> @@ -133,7 +133,8 @@
>    * lower due to fragmentation.
>    */
>   #ifndef VM_SWZONE_SIZE_MAX
> -#define VM_SWZONE_SIZE_MAX	(276 * 128 * 1024)
> +#define SIZEOF_SWBLK_GUESS 276
> +#define VM_SWZONE_SIZE_MAX	(SIZEOF_SWBLK_GUESS * 128 * 1024)
>   #endif
>   
>   /*
> Index: vm/swap_pager.c
> ===================================================================
> --- vm/swap_pager.c	(revision 331152)
> +++ vm/swap_pager.c	(working copy)
> @@ -520,8 +520,16 @@
>   	 * on the number of pages in the system.
>   	 */
>   	n = vm_cnt.v_page_count / 2;
> -	if (maxswzone && n > maxswzone / sizeof(struct swblk))
> +	/* reduce size or make larger within limits */
> +	if (maxswzone && (n != maxswzone / sizeof(struct swblk))) {
> +		if (4 * n < maxswzone / sizeof(struct swblk)) {
> +			n *= 4;
> +			printf("kern.maxswzone (%lu) set too high: "
> +			       "limit is %lu\n", maxswzone,
> +			       n * sizeof(struct swblk));
> +		}
>   		n = maxswzone / sizeof(struct swblk);
> +	}
>   	swpctrie_zone = uma_zcreate("swpctrie", pctrie_node_size(), NULL, NULL,
>   	    pctrie_zone_init, NULL, UMA_ALIGN_PTR,
>   	    UMA_ZONE_NOFREE | UMA_ZONE_VM);
> @@ -2141,11 +2149,20 @@
>   
>   	/* recommend using no more than half that amount */
>   	if (npages > maxpages / 2) {
> -		printf("warning: total configured swap (%lu pages) "
> -		    "exceeds maximum recommended amount (%lu pages).\n",
> -		    npages, maxpages / 2);
> -		printf("warning: increase kern.maxswzone "
> -		    "or reduce amount of swap.\n");
> +		printf("warning: total configured swap (%lu pages, %lu MB) "
> +		       "exceeds maximum recommended amount (%lu pages, %lu MB).\n",
> +		       npages, swap_total / (1024*1024),
> +		       maxpages / 2, (maxpages / 2) * PAGE_SIZE / (1024*1024));
> +		printf("warning: increase kern.maxswzone from %lu to %lu "
> +		       "or reduce amount of swap.\n", maxswzone,
> +		       (maxpages / SWAP_META_PAGES) * 2
> +		       * sizeof(struct swblk));
> +#ifdef SIZEOF_SWBLK_GUESS
> +		if (SIZEOF_SWBLK_GUESS != sizeof(struct swblk))
> +			printf("warning: bad guess on swblk size: "
> +			       "%d != %lu\n",
> +			       SIZEOF_SWBLK_GUESS, sizeof(struct swblk));
> +#endif
>   	}
>   }
>   




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?387a65b7-d221-0a10-b801-1dd573054e10>