Date: Sat, 1 Feb 2014 22:16:12 +0100 From: Matthew Rezny <matthew@reztek.cz> To: Richard Todd <rmtodd@servalan.servalan.com> Cc: freebsd-stable@freebsd.org Subject: Re: Tuning kern.maxswzone Message-ID: <20140201221612.00001897@unknown> In-Reply-To: <x7d2j64pvw.fsf@ichotolot.servalan.com> References: <20140201070912.00007971@unknown> <x7d2j64pvw.fsf@ichotolot.servalan.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 01 Feb 2014 13:03:15 -0600 Richard Todd <rmtodd@servalan.servalan.com> wrote: > Matthew Rezny <matthew@reztek.cz> writes: > > > So, as best I can tell, the actual effective number used for > > kern.maxswzone is indeed approximately 8x available RAM. If there is > > some need to turn it down (using substantially less swap) then that > > is possible, but turning it up (as suggested by the warning > > message) is not possible. Setting any value higher does not appear > > to actually > > Yeah, IIRC I ran into that when configing some VMs with > really big swap space for benefit of tmpfs. This is the quick hack > I used to get around that, you might give it a try. > > # HG changeset patch > # Parent e4dd7df011139e2b224835aa6e330c90afcf9a55 > patch swap_pager to unconditionally use maxswzone tunable if it is > set -- helpful for our tbvm VMs with large swap space for tmpfs > > diff -r e4dd7df01113 sys/vm/swap_pager.c > --- a/sys/vm/swap_pager.c Wed Feb 20 00:15:49 2013 -0600 > +++ b/sys/vm/swap_pager.c Sat Feb 23 13:08:54 2013 -0600 > @@ -546,7 +546,7 @@ > * is typically limited to around 32MB by default. > */ > n = cnt.v_page_count / 2; > - if (maxswzone && n > maxswzone / sizeof(struct swblock)) > + if (maxswzone) > n = maxswzone / sizeof(struct swblock); > n2 = n; > swap_zone = uma_zcreate("SWAPMETA", sizeof(struct swblock), > NULL, NULL, > Thank you for pointing me in the right direction. Now that I'm looking at the right file, I don't know how I failed to find it myself with grep. The logic here is rather obviously flawed, the maxswzone value is only used if it is less than the calculated default. If there is some reason to not allow adjusting this up, then the warning message is incorrect. Either way, something should change and I guess I should file a PR on this one. At least I can see the warning is taking the doubling for safety into account, so for my particular case with an overrun under 10% it should probably be ok to put fixing this on the back burner. > Also, > > > With /usr/src cleared (and after running fsck) I booted back into > > 10-PRERELEASE to try to fetch the 10-STABLE sources again. I started > > svnlite co and find it hung shortly thereafter. I tried a few times > > but each time I see it does a couple hundred files at best and just > > stops. When it stops, I can't login to another terminal. If I have > > a spare console logged in, I can't run anything. After a few tries, > > I manged to catch it where I had top running in one VT, started the > > checkout, and then switched back just in time. I never could even > > get top up with rm running (it probably blows over some limit > > faster). When the checkout hangs, the state of svnlite is "kmem a" > > according to top. I can only guess that is short for kmem alloc, I > > guess some syscall is waiting on an allocation that will never > > happen because something already is using or has leaked everything > > that could satisfy that allocation. It looks like activity on too > > many files within a short period runs something out. > > No, it's just a new bit of debugging code that causes the system > to spend lots of CPU time verifying integrity of some of its internal > data structures, especially on wimpy hardware (e.g., my dual PII/400 > box, which is where I noticed this recently.) You'll find if you're > patient that it isn't a complete hang, it will actually get work done > in between the debug passes. Set sysctl debug.vmem_check=0 to disable > the check. This is I think completely independent from the maxswzone > stuff, it's just you were seeing it for the first time since the > debug code in question was only recently added to 10-STABLE. > If only it were that simple. I'm not yet on 10-STABLE, I'm struggling to get the sources to reach that point. These C3 boxes are 10-PRERELEASE so I don't yet have this debug.vmem_check to tweak. sysctl says unknown oid when I query it. I tried setting it from loader prompt but still says unknown oid and I see no change in behavior. Also, I'm not seeing anything using lots of CPU time. If I start off top on another VT before I start svnlite, then I have a decent chance of seeing what goes on until the situation becomes dire. svnlite starts off moving quick and using lots of CPU time (>50%) for the first hundred files or so, then it halts in the "kmem a" state and CPU is completely idle. It sits there for a while, and eventually does something more. Each time the stop is longer and less work is done in the interval between stops. Eventually the process appears to hang completely in that I can leave it for half a day and no more progress is made. The rm process could go longer in this state with still some visible progress since it's operation is sufficiently simple to actually observe it managed to do something, whereas svnlite might do something occasionally but it's not enough to get to the next file in the list. With each burst and wait cycle, I see a spray in dmesg saying calcru: runtime went backwards [lots] for {all processes}. If I hit ctrl-C and wait, it'll interrupt svnlite the next time it would do something other than wait. It can easily take 10 min for that to happen, meaning the wait period is long enough to consider the process as not effectively running. If I let it go long enough, it gets to a state where it never exits on ctrl-c (or at least it'd take more hours than I've been willing to wait). In this last attempt, I let it go for 5 min, pushed ctrl-c, waited another 10 min, then tried ctrl-alt-del and that just beeps so the attempt to interrupt the process resulted in total I/O lockup to the point key presses are not handled. That last one was enough to finally require manual fsck on reboot (which should be a testament to the resiliency of UFS as I've pushed the reset button a hundred times in the past day and a half). > Richard > > Thank you for taking the time to respond. It's now clear the maxswzone is just a red herring, the real issue are the apparent hangs which I'm seeing on several more boxes. The mystery is why the one box with slightly more RAM seems ok, but a couple boxes with far more RAM are not ok. That will probably be answered when I figure out what the cause is.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140201221612.00001897>