From owner-freebsd-stable@FreeBSD.ORG Fri Feb 7 07:28:39 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 96DC1A59; Fri, 7 Feb 2014 07:28:39 +0000 (UTC) Received: from mail.modirum.com (mail.modirum.com [31.185.27.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 29C7A1A25; Fri, 7 Feb 2014 07:28:38 +0000 (UTC) Received: from [77.87.241.103] (helo=unknown) by mail.modirum.com with esmtpsa (TLSv1:DHE-RSA-AES128-SHA:128) (Exim 4.80.1 (FreeBSD)) (envelope-from ) id 1WBfrU-000BfA-SZ; Fri, 07 Feb 2014 07:28:33 +0000 Date: Fri, 7 Feb 2014 08:28:39 +0100 From: Matthew Rezny To: FreeBSD Stable Mailing List Subject: Re: Tuning kern.maxswzone is minor compared to hangs in "kmem a" state Message-ID: <20140207082839.00001a3a@unknown> In-Reply-To: References: <20140201070912.00007971@unknown> <20140201221612.00001897@unknown> <20140202204623.00003fe5@unknown> Organization: RezTek, s.r.o. X-Mailer: Claws Mail 3.9.2-55-g74b05b (GTK+ 2.16.6; i586-pc-mingw32msvc) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-SA-Authenticated: Yes X-SA-Exim-Connect-IP: 77.87.241.103 X-SA-Exim-Mail-From: matthew@reztek.cz X-SA-Exim-Scanned: No (on mail.modirum.com); SAEximRunCond expanded to false Cc: Adrian Chadd X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Feb 2014 07:28:39 -0000 On Sun, 2 Feb 2014 15:59:57 -0800 Adrian Chadd wrote: > [snip] > > So next time this happens, run "procstat -kka" - this will dump out > the processes and a basic function call trace for each of them. > > I'll see if there's a way to teach procstat to output line numbers if > the kernel debug image is available, but generally that's enough to > then pass to kgdb to figure out the line number. > > > -a I'm not sure if I would be able to run procstat given that top or reboot fail to run once the kernel is up against the limit. Fortunately, I haven't had a chance to even try that. I found the solution shortly after my last message but it took some time to verify. The solution is to increase vm.kmem_size to give the kernel some room to grow. That is one of the first things in tuning ZFS, which I had just started dealing with on i386 on a pair of less obsolete boxes. It struck me that maybe I should look into this parameter, which is one I never had occasion to have to touch before. On the box with 384MB RAM this was defaulting to 120.5MB, but on the box with 256MB RAM it was defaulting to only 80MB. It appears the default is simply 1/3 available RAM at boot. Presumably there is some lower bound, however that lower bound is no longer sufficient with all else default. I set vm.kmem_size="120M" in loader.conf and after rebooting I saw an immediate world of difference. I could do svnlite status and it completed. I put the box through the paces, svn up, buildworld and kernel, installworld and kernel (so now running 10-STABLE), svn up again, buildworld -DNO_CLEAN (still takes half a day, but that's far less than 2 days), installworld again. In other words, it was back to how it had been while running 9-STABLE. I am perfectly happy to give more than half the system memory to the kernel to have it stable. While that box was on the second run through building world, I decided to collect some numbers. I setup a temporary test VM in VirtualBox (on a much faster machine) configured with no harddrive image and boot from ISO. I used both 9.2 and 10.0 release ISOs for i386. I went through configurations with 128, 192, 256, and 384MB of RAM. For each OS and RAM combination, I booted the system and collected the entire output of sysctl. I did not do an stress testing, just collected default values. Comparing the data what I found is that while many other values changed, the value of vm.kmem_size stays the same from 9.2 to 10.0 at any given system memory size. I observed that the lower bound (if any) takes effect well below my smallest box since the 1/3 trend continued down all the way down to the smallest test VM with only half the RAM. It seems clear that with default settings on 10.0, vm.kmem_size is undersized for low-memory machines even running UFS. It seems common knowledge this needs to go up for ZFS, but it's complete news to me it needs to go up for UFS. I have not dug deep enough to determine if there is a singular culprit or multiple subsystems have grown more memory hungry over time. From the one panic I got it appears at least UFS with softupdates can be at least one. The extreme sluggishness if not complete hang or outright failure of disk I/O is a symptom consistent with the hypothesis that it has been UFS hitting an allocation limit in the kernel each time. Not only is it news to me that this might need to be increased on a UFS system, but the whole existence of vm.kmem_size is news to me since I never had to touch it in the past. On amd64, I see it is simply set to physical RAM, which makes sense, the kernel can grow to fill RAM but not beyond (don't want to swap it). Obviously it can't simply be equal to physical RAM on i386, it can't be over 1GB unless KVA_PAGES is increased. I don't understand why it isn't simply set to min(phy_mem, 1GB) by default. I can understand having a tunable to limit the kernel from growing for cases where the administrator knows best for some very specific working set. As the general case, I expect the kernel to handle maintaining the optimal balance between itself and user programs and I expect the ideal balance to vary between different workloads. Having a hard limit on kernel size only serves to limit the kernel's ability to tune the system according to current conditions. Is there any benefit to limiting kmem_size on i386? Is there any reason I should not simply set this vm.kmem_size=min(phys_mem, 1GB) on all my i386 boxes? For what reason is it stated that KVA_PAGES needs to be increased when setting kmem_size > 512MB when the default for KVA_PAGES gives a 1GB kernel memory space?