From owner-freebsd-stable@FreeBSD.ORG  Fri Feb  7 07:28:39 2014
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 96DC1A59;
 Fri,  7 Feb 2014 07:28:39 +0000 (UTC)
Received: from mail.modirum.com (mail.modirum.com [31.185.27.10])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 29C7A1A25;
 Fri,  7 Feb 2014 07:28:38 +0000 (UTC)
Received: from [77.87.241.103] (helo=unknown)
 by mail.modirum.com with esmtpsa (TLSv1:DHE-RSA-AES128-SHA:128)
 (Exim 4.80.1 (FreeBSD)) (envelope-from <matthew@reztek.cz>)
 id 1WBfrU-000BfA-SZ; Fri, 07 Feb 2014 07:28:33 +0000
Date: Fri, 7 Feb 2014 08:28:39 +0100
From: Matthew Rezny <matthew@reztek.cz>
To: FreeBSD Stable Mailing List <freebsd-stable@freebsd.org>
Subject: Re: Tuning kern.maxswzone is minor compared to hangs in "kmem a" state
Message-ID: <20140207082839.00001a3a@unknown>
In-Reply-To: <CAJ-Vmo=NN36eaqY=0w2gcrOaM-ZdeR3Wx9snOuvoD+C=9WwKpw@mail.gmail.com>
References: <20140201070912.00007971@unknown>
 <x7d2j64pvw.fsf@ichotolot.servalan.com>
 <20140201221612.00001897@unknown> <20140202204623.00003fe5@unknown>
 <CAJ-Vmo=NN36eaqY=0w2gcrOaM-ZdeR3Wx9snOuvoD+C=9WwKpw@mail.gmail.com>
Organization: RezTek, s.r.o.
X-Mailer: Claws Mail 3.9.2-55-g74b05b (GTK+ 2.16.6; i586-pc-mingw32msvc)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-SA-Authenticated: Yes
X-SA-Exim-Connect-IP: 77.87.241.103
X-SA-Exim-Mail-From: matthew@reztek.cz
X-SA-Exim-Scanned: No (on mail.modirum.com); SAEximRunCond expanded to false
Cc: Adrian Chadd <adrian@freebsd.org>
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Feb 2014 07:28:39 -0000

On Sun, 2 Feb 2014 15:59:57 -0800
Adrian Chadd <adrian@freebsd.org> wrote:

> [snip]
> 
> So next time this happens, run "procstat -kka" - this will dump out
> the processes and a basic function call trace for each of them.
> 
> I'll see if there's a way to teach procstat to output line numbers if
> the kernel debug image is available, but generally that's enough to
> then pass to kgdb to figure out the line number.
> 
> 
> -a

I'm not sure if I would be able to run procstat given that top or
reboot fail to run once the kernel is up against the limit.
Fortunately, I haven't had a chance to even try that. I found the
solution shortly after my last message but it took some time to verify.

The solution is to increase vm.kmem_size to give the kernel some room
to grow. That is one of the first things in tuning ZFS, which I had
just started dealing with on i386 on a pair of less obsolete boxes. It
struck me that maybe I should look into this parameter, which is one I
never had occasion to have to touch before. On the box with 384MB RAM
this was defaulting to 120.5MB, but on the box with 256MB RAM it was
defaulting to only 80MB. It appears the default is simply 1/3 available
RAM at boot. Presumably there is some lower bound, however that lower
bound is no longer sufficient with all else default.

I set vm.kmem_size="120M" in loader.conf and after rebooting I saw an
immediate world of difference. I could do svnlite status and it
completed. I put the box through the paces, svn up, buildworld and
kernel, installworld and kernel (so now running 10-STABLE), svn up
again, buildworld -DNO_CLEAN (still takes half a day, but that's far
less than 2 days), installworld again. In other words, it was back to
how it had been while running 9-STABLE. I am perfectly happy to give
more than half the system memory to the kernel to have it stable.

While that box was on the second run through building world, I decided
to collect some numbers. I setup a temporary test VM in VirtualBox (on
a much faster machine) configured with no harddrive image and boot from
ISO. I used both 9.2 and 10.0 release ISOs for i386. I went through
configurations with 128, 192, 256, and 384MB of RAM. For each OS and
RAM combination, I booted the system and collected the entire output of
sysctl. I did not do an stress testing, just collected default values.
Comparing the data what I found is that while many other values
changed, the value of vm.kmem_size stays the same from 9.2 to 10.0 at
any given system memory size. I observed that the lower bound (if any)
takes effect well below my smallest box since the 1/3 trend continued
down all the way down to the smallest test VM with only half the RAM.

It seems clear that with default settings on 10.0, vm.kmem_size is
undersized for low-memory machines even running UFS. It seems common
knowledge this needs to go up for ZFS, but it's complete news to me it
needs to go up for UFS. I have not dug deep enough to determine if
there is a singular culprit or multiple subsystems have grown more
memory hungry over time. From the one panic I got it appears at least
UFS with softupdates can be at least one. The extreme sluggishness if
not complete hang or outright failure of disk I/O is a symptom
consistent with the hypothesis that it has been UFS hitting an
allocation limit in the kernel each time.

Not only is it news to me that this might need to be increased on a UFS
system, but the whole existence of vm.kmem_size is news to me since I
never had to touch it in the past. On amd64, I see it is simply set to
physical RAM, which makes sense, the kernel can grow to fill RAM but
not beyond (don't want to swap it). Obviously it can't simply be equal
to physical RAM on i386, it can't be over 1GB unless KVA_PAGES is
increased. I don't understand why it isn't simply set to min(phy_mem,
1GB) by default. I can understand having a tunable to limit the kernel
from growing for cases where the administrator knows best for some
very specific working set. As the general case, I expect the kernel to
handle maintaining the optimal balance between itself and user programs
and I expect the ideal balance to vary between different workloads.
Having a hard limit on kernel size only serves to limit the kernel's
ability to tune the system according to current conditions.

Is there any benefit to limiting kmem_size on i386? Is there any reason
I should not simply set this vm.kmem_size=min(phys_mem, 1GB) on all my
i386 boxes? For what reason is it stated that KVA_PAGES needs to be
increased when setting kmem_size > 512MB when the default for KVA_PAGES
gives a 1GB kernel memory space?