From owner-freebsd-hackers@FreeBSD.ORG Thu Jan 11 07:08:42 2007 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 837AC16A407; Thu, 11 Jan 2007 07:08:42 +0000 (UTC) (envelope-from blc@bsdwins.com) Received: from bsdone.bsdwins.com (www.bsdwins.com [192.58.184.33]) by mx1.freebsd.org (Postfix) with ESMTP id 4ABBC13C44B; Thu, 11 Jan 2007 07:08:42 +0000 (UTC) (envelope-from blc@bsdwins.com) Received: from bsdone.bsdwins.com (localhost [127.0.0.1]) by bsdone.bsdwins.com (8.13.6/8.13.6) with ESMTP id l0B74RhG055382; Thu, 11 Jan 2007 07:04:27 GMT (envelope-from blc@www.bsdwins.com) Received: (from blc@localhost) by bsdone.bsdwins.com (8.13.6/8.13.6/Submit) id l0B74QdT055379; Thu, 11 Jan 2007 02:04:26 -0500 (EST) (envelope-from blc) Date: Thu, 11 Jan 2007 02:04:26 -0500 From: "Brad L. Chisholm" To: John Baldwin Message-ID: <20070111070426.GB52964@bsdone.bsdwins.com> References: <20070110215207.GA85834@bsdone.bsdwins.com> <200701101753.24716.jhb@freebsd.org> <20070111001534.GA319@bsdone.bsdwins.com> <200701102211.39412.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200701102211.39412.jhb@freebsd.org> User-Agent: Mutt/1.4.2.1i X-Mailman-Approved-At: Thu, 11 Jan 2007 12:41:45 +0000 Cc: "Brad L. Chisholm" , freebsd-hackers@freebsd.org Subject: Re: Kernel hang on 6.x X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jan 2007 07:08:42 -0000 On Wed, Jan 10, 2007 at 10:11:38PM -0500, John Baldwin wrote: > On Wednesday 10 January 2007 19:15, Brad L. Chisholm wrote: > > > > I notice the following in the vm.zone output captured just prior to > > a hang. Does this value correspond to the swap_zone you were referring > > to? This looks like a limit may have been reached. > > > > SWAPMETA: 288, 116519, 116519, 0, 116543 > > yep, that's exactly the issue you are hitting. > > > I don't seem to be able to query kern.maxswzone on our 6.2-BETA2 image: > > > > # sysctl kern.maxswzone > > sysctl: unknown oid 'kern.maxswzone' > > > > Is it available in 6.x, or is it something newer? > > It's only a tunable, not available as a sysctl. You can figure out the > current size from the vmstat output above, then do some math to figure > out a good guess to use based on how much swap it had in use when it > locked up. For example, right now you have 116519 objects of size 288, so > 33557472 bytes allocated. You said you die when 14 GB out of 64 total is > used, so you should probably try taking that value and multiplying it by > 64 / 14. That gives a result of 153405586. However, you really want to > round this up to a multiple of 288 (because the kernel rounds it down to > a multiple of 288), so I'd use a value of at least 153405792. And yes, > that means you are setting aside a little over 146 MB of wired, physical > RAM just to hold metadata for your swap. :) > Excellent! Increasing kern.maxswzone has indeed fixed the problem. Can this value be auto-tuned better based upon the size of swap, or is it the particular swapping pattern caused by our environment that caused the default size to be insufficient? In any case, the kernel printf you added recently should help make this much easier to diagnose in the future. Thanks for your help! --- Brad Chisholm blc@bsdwins.com