From owner-freebsd-hackers@FreeBSD.ORG Thu Jan 11 16:41:57 2007 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7BC3F16A725 for ; Thu, 11 Jan 2007 16:41:57 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by mx1.freebsd.org (Postfix) with ESMTP id 7DFE313C465 for ; Thu, 11 Jan 2007 16:41:55 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.13.6/8.13.6) with ESMTP id l0BGfHZU097859; Thu, 11 Jan 2007 11:41:37 -0500 (EST) (envelope-from jhb@freebsd.org) From: John Baldwin To: "Brad L. Chisholm" Date: Thu, 11 Jan 2007 11:22:44 -0500 User-Agent: KMail/1.9.1 References: <20070110215207.GA85834@bsdone.bsdwins.com> <200701102211.39412.jhb@freebsd.org> <20070111070426.GB52964@bsdone.bsdwins.com> In-Reply-To: <20070111070426.GB52964@bsdone.bsdwins.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200701111122.45347.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Thu, 11 Jan 2007 11:41:38 -0500 (EST) X-Virus-Scanned: ClamAV 0.88.3/2436/Thu Jan 11 06:48:19 2007 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: freebsd-hackers@freebsd.org Subject: Re: Kernel hang on 6.x X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jan 2007 16:41:57 -0000 On Thursday 11 January 2007 02:04, Brad L. Chisholm wrote: > On Wed, Jan 10, 2007 at 10:11:38PM -0500, John Baldwin wrote: > > On Wednesday 10 January 2007 19:15, Brad L. Chisholm wrote: > > > > > > I notice the following in the vm.zone output captured just prior to > > > a hang. Does this value correspond to the swap_zone you were referring > > > to? This looks like a limit may have been reached. > > > > > > SWAPMETA: 288, 116519, 116519, 0, 116543 > > > > yep, that's exactly the issue you are hitting. > > > > > I don't seem to be able to query kern.maxswzone on our 6.2-BETA2 image: > > > > > > # sysctl kern.maxswzone > > > sysctl: unknown oid 'kern.maxswzone' > > > > > > Is it available in 6.x, or is it something newer? > > > > It's only a tunable, not available as a sysctl. You can figure out the > > current size from the vmstat output above, then do some math to figure > > out a good guess to use based on how much swap it had in use when it > > locked up. For example, right now you have 116519 objects of size 288, so > > 33557472 bytes allocated. You said you die when 14 GB out of 64 total is > > used, so you should probably try taking that value and multiplying it by > > 64 / 14. That gives a result of 153405586. However, you really want to > > round this up to a multiple of 288 (because the kernel rounds it down to > > a multiple of 288), so I'd use a value of at least 153405792. And yes, > > that means you are setting aside a little over 146 MB of wired, physical > > RAM just to hold metadata for your swap. :) > > > > Excellent! Increasing kern.maxswzone has indeed fixed the problem. Can > this value be auto-tuned better based upon the size of swap, or is it the > particular swapping pattern caused by our environment that caused the > default size to be insufficient? In any case, the kernel printf you added > recently should help make this much easier to diagnose in the future. > > Thanks for your help! The kernel does do a guess, but it doesn't always get the guess right, and I think there might be a bug where it always guesses wrong for > 32GB of swap. :) -- John Baldwin