From owner-freebsd-current@FreeBSD.ORG Wed May 16 12:24:35 2007 Return-Path: X-Original-To: current@freebsd.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6667416A40B for ; Wed, 16 May 2007 12:24:35 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from mh1.centtech.com (moat3.centtech.com [64.129.166.50]) by mx1.freebsd.org (Postfix) with ESMTP id 3367413C455 for ; Wed, 16 May 2007 12:24:34 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from neutrino.centtech.com (neutrino.centtech.com [10.177.171.220]) by mh1.centtech.com (8.13.8/8.13.8) with ESMTP id l4GCOVPA098242; Wed, 16 May 2007 07:24:31 -0500 (CDT) (envelope-from anderson@freebsd.org) Message-ID: <464AF7FF.9010808@freebsd.org> Date: Wed, 16 May 2007 07:24:31 -0500 From: Eric Anderson User-Agent: Thunderbird 2.0.0.0 (X11/20070420) MIME-Version: 1.0 To: Eric Anholt References: <1178329243.54075.34.camel@vonnegut> <20070505033659.GB2462@tnn.dglawrence.com> <1178404852.54075.47.camel@vonnegut> In-Reply-To: <1178404852.54075.47.camel@vonnegut> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.88.4/3258/Wed May 16 06:06:43 2007 on mh1.centtech.com X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=8.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.6 X-Spam-Checker-Version: SpamAssassin 3.1.6 (2006-10-03) on mh1.centtech.com Cc: David G Lawrence , current@freebsd.org Subject: Re: swap_pager_swap_init panic X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 May 2007 12:24:35 -0000 On 05/05/07 17:40, Eric Anholt wrote: > On Fri, 2007-05-04 at 20:36 -0700, David G Lawrence wrote: >>> I've got an SMP netbooting test machine, which panics on startup almost >>> 100% of the time with the issue that has been reported since 2006-12-02 >>> at least: >>> db> where >>> Tracing pid 40 tid 100040 td 0xffffff003b9e24c0 >>> kdb_enter() at kdb_enter+0x2f >>> panic() at panic+0x291 >>> swap_pager_swap_init() at swap_pager_swap_init+0x20c >> I had the same problem with one of my SMP machines. It's a curious problem >> since I can take the same hard drive over to a slightly different SMP machine >> and not see the panic. It appears to be a race of some kind in the VM system >> initialization, right after the second CPU is started. >> Try this patch out and let me know if it fixes it for you... >> >> >> Index: uma_core.c >> =================================================================== >> RCS file: /home/ncvs/src/sys/vm/uma_core.c,v >> retrieving revision 1.119.2.19 >> diff -c -r1.119.2.19 uma_core.c >> *** uma_core.c 11 Feb 2007 03:31:19 -0000 1.119.2.19 >> --- uma_core.c 1 Mar 2007 06:52:26 -0000 >> *************** >> *** 1615,1621 **** >> #endif >> args.name = "UMA Zones"; >> args.size = sizeof(struct uma_zone) + >> ! (sizeof(struct uma_cache) * (mp_maxid + 1)); >> args.ctor = zone_ctor; >> args.dtor = zone_dtor; >> args.uminit = zero_init; >> --- 1615,1621 ---- >> #endif >> args.name = "UMA Zones"; >> args.size = sizeof(struct uma_zone) + >> ! (sizeof(struct uma_cache) * (mp_maxid + 33)); >> args.ctor = zone_ctor; >> args.dtor = zone_dtor; >> args.uminit = zero_init; >> >> Note that I don't claim this is a proper fix - it is just a work-around >> that works for me. > > I've got the machine busy doing something else right now. What's this > patch supposed to do? The panic clearly looks like a race to me -- the > swapper's got the keg's recurse flag bumped (that printf in the > backtrace I think was the one just after taking the recurse flag back > down), and pagedaemon is checking for recursion on using the keg, and > failing. > Anything more come of this? I'm seeing this too. AMD64, recent -CURRENT, SMP, Core 2 Duo, NFS booting. Eric