From owner-freebsd-stable@FreeBSD.ORG Wed Jun 10 18:14:42 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B82DE106566B for ; Wed, 10 Jun 2009 18:14:42 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 8B07D8FC18 for ; Wed, 10 Jun 2009 18:14:42 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 1E32B46B17; Wed, 10 Jun 2009 14:14:42 -0400 (EDT) Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8]) by bigwig.baldwin.cx (Postfix) with ESMTPA id AE4068A06E; Wed, 10 Jun 2009 14:14:40 -0400 (EDT) From: John Baldwin To: freebsd-stable@freebsd.org Date: Wed, 10 Jun 2009 14:12:35 -0400 User-Agent: KMail/1.9.7 References: <20090610115351.V56412@hub.org> In-Reply-To: <20090610115351.V56412@hub.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200906101412.35353.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 10 Jun 2009 14:14:40 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: "Marc G. Fournier" Subject: Re: Server lock up: kern.maxswzone relate ... X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jun 2009 18:14:43 -0000 On Wednesday 10 June 2009 11:04:48 am Marc G. Fournier wrote: > > I'm running a couple of brand new servers ... 32G of RAM, very little load > on it right now, and this morning it locked up with that 'kern.maxswzone' > error on the console ... > > The server is running a reasonably current 7.2-STABLE: > > FreeBSD pluto.hub.org 7.2-STABLE FreeBSD 7.2-STABLE #0: Sun May 31 > 14:48:04 ADT > > And top right now, with everything running, shows no swappping, 19G of > Free memory, 9G of Inact memory ... no reason to do any serious amount of > swapping. > > last pid: 32159; load averages: 0.12, 0.21, 0.47 up 0+10:57:56 11:53:39 > 573 processes: 1 running, 571 sleeping, 1 zombie > CPU: 2.0% user, 0.0% nice, 1.2% system, 0.0% interrupt, 96.8% idle > Mem: 1331M Active, 9446M Inact, 659M Wired, 35M Cache, 399M Buf, 19G Free > Swap: 32G Total, 32G Free > > In fact, my other server (same config), has been up 9 days (they were put > online 9 days ago), and tops shows it doing a little bit of swapping, but, > again, huge amounts of Inact memory: > > last pid: 26307; load averages: 0.36, 0.35, 0.36 up 9+17:03:48 > 11:57:54 > 680 processes: 2 running, 657 sleeping, 21 zombie > CPU: 0.7% user, 0.0% nice, 0.4% system, 0.0% interrupt, 98.9% idle > Mem: 2915M Active, 25G Inact, 778M Wired, 13M Cache, 399M Buf, 1771M Free > Swap: 32G Total, 1044K Used, 32G Free > > So these servers right now are definitely not feeling any pain ... > > And, based on experiences with another server, I have my /boot/loader.conf > set to: > > kern.maxswzone=67108864 > > So, the question is ... what am I missing? Is there some magical formula > for calculating maxswzone that 7.2 is missing? Some nagios plug-in I > shuld be using to monitor ... what? > > Help? There are changes in 8 that you can ask kib@ to MFC perhaps that help some. They make the kernel kill a process when maxswzone is empty similar to what happens when you run out of swap space. If you break into the debugger and get a crashdump, you can verify 1) that you were swapping, and 2) you can calculate a better value for maxswzone. The problem with making maxswzone really big is that it uses up wired memory that can't be reused for anything else, so you don't just want to blindly use the maximum amount for the swap you have. -- John Baldwin