From owner-freebsd-hackers@FreeBSD.ORG Thu Jan 11 00:19:50 2007 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0AD1416A40F; Thu, 11 Jan 2007 00:19:50 +0000 (UTC) (envelope-from blc@bsdwins.com) Received: from bsdone.bsdwins.com (www.bsdwins.com [192.58.184.33]) by mx1.freebsd.org (Postfix) with ESMTP id C81F613C44B; Thu, 11 Jan 2007 00:19:49 +0000 (UTC) (envelope-from blc@bsdwins.com) Received: from bsdone.bsdwins.com (localhost [127.0.0.1]) by bsdone.bsdwins.com (8.13.6/8.13.6) with ESMTP id l0B0FYcP004572; Thu, 11 Jan 2007 00:15:34 GMT (envelope-from blc@www.bsdwins.com) Received: (from blc@localhost) by bsdone.bsdwins.com (8.13.6/8.13.6/Submit) id l0B0FYG7004571; Wed, 10 Jan 2007 19:15:34 -0500 (EST) (envelope-from blc) Date: Wed, 10 Jan 2007 19:15:34 -0500 From: "Brad L. Chisholm" To: John Baldwin Message-ID: <20070111001534.GA319@bsdone.bsdwins.com> References: <20070110215207.GA85834@bsdone.bsdwins.com> <200701101753.24716.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200701101753.24716.jhb@freebsd.org> User-Agent: Mutt/1.4.2.1i X-Mailman-Approved-At: Thu, 11 Jan 2007 00:54:48 +0000 Cc: "Brad L. Chisholm" , freebsd-hackers@freebsd.org Subject: Re: Kernel hang on 6.x X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jan 2007 00:19:50 -0000 On Wed, Jan 10, 2007 at 05:53:24PM -0500, John Baldwin wrote: > On Wednesday 10 January 2007 16:52, Brad L. Chisholm wrote: > > > > I work with Brian, and have been helping him analyze this problem. We have > > been able to generate kernel dumps, and have also done some additional > > analysis under ddb. Here is a summary of our analysis so far. Suggestions > > as to how to proceed from here are most welcome. > > How much swap do you have? You might have run out of buckets in the > swap_zone before you ran out of swap space, in which case the kernel > deadlocks rather than killing the hog like it does when it runs out of > swap space. I added a printf to catch this on HEAD recently that will > be MFC'd soonish. You can try bumping up kern.maxswzone (loader tunable). > It has a 32GB swap partition. We have also run it configured with an additional 32GB swap file, for a total of 64GB. Changing the amount of swap did not seem to affect the hang. However, as I mentioned in my previous post, the hang appears to always occur when ~14GB of swap have been consumed, regardless of the amount of swap or physmen configured. This does make it sound like a limit (such as swap_zone buckets) has been reached. I notice the following in the vm.zone output captured just prior to a hang. Does this value correspond to the swap_zone you were referring to? This looks like a limit may have been reached. SWAPMETA: 288, 116519, 116519, 0, 116543 I don't seem to be able to query kern.maxswzone on our 6.2-BETA2 image: # sysctl kern.maxswzone sysctl: unknown oid 'kern.maxswzone' Is it available in 6.x, or is it something newer? Thanks! --- Brad Chisholm blc@bsdwins.com