From owner-freebsd-current@FreeBSD.ORG Mon Dec 6 13:59:39 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7C19D16A4CE for ; Mon, 6 Dec 2004 13:59:39 +0000 (GMT) Received: from relay03.pair.com (relay03.pair.com [209.68.5.17]) by mx1.FreeBSD.org (Postfix) with SMTP id 8E20743D6D for ; Mon, 6 Dec 2004 13:59:38 +0000 (GMT) (envelope-from pho@holm.cc) Received: (qmail 28110 invoked from network); 6 Dec 2004 13:59:36 -0000 Received: from unknown (HELO peter.osted.lan) (unknown) by unknown with SMTP; 6 Dec 2004 13:59:36 -0000 X-pair-Authenticated: 80.164.63.199 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.13.1/8.13.1) with ESMTP id iB6DxYAN024289; Mon, 6 Dec 2004 14:59:34 +0100 (CET) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.13.1/8.13.1/Submit) id iB6DxYEM024288; Mon, 6 Dec 2004 14:59:34 +0100 (CET) (envelope-from pho) Date: Mon, 6 Dec 2004 14:59:34 +0100 From: Peter Holm To: John Baldwin Message-ID: <20041206135934.GA24238@peter.osted.lan> References: <20041112123343.GA12048@peter.osted.lan> <200411151546.15533.jhb@FreeBSD.org> <20041119075924.GA22320@peter.osted.lan> <200411191710.19215.jhb@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200411191710.19215.jhb@FreeBSD.org> User-Agent: Mutt/1.4.2.1i cc: freebsd-current@FreeBSD.org cc: bmilekic@FreeBSD.org cc: jeffr@FreeBSD.org cc: jroberson@chesapeake.net Subject: Re: Freeze X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Dec 2004 13:59:39 -0000 On Fri, Nov 19, 2004 at 05:10:19PM -0500, John Baldwin wrote: > On Friday 19 November 2004 02:59 am, Peter Holm wrote: > > On Mon, Nov 15, 2004 at 03:46:15PM -0500, John Baldwin wrote: > > > On Friday 12 November 2004 07:33 am, Peter Holm wrote: > > > > GENERIC HEAD from Nov 11 08:05 UTC > > > > > > > > The following stack traces etc. was done before my first > > > > cup of coffee, so it's not so informative as it could have been :-( > > > > > > > > The test box appeared to have been frozen for more than 6 hours, > > > > but was pingable. > > > > > > > > http://www.holm.cc/stress/log/cons86.html > > > > > > A weak guess is that you have the system in some sort of livelock due to > > > fork()? Have you tried running with 'debug.mpsafevm=1' set from the > > > loader? > > > > > > -- > > > John Baldwin <>< http://www.FreeBSD.org/~jhb/ > > > "Power Users Use the Power to Serve" = http://www.FreeBSD.org > > > > OK, I've got some more info: > > > > http://www.holm.cc/stress/log/cons88.html > > > > Looks like a spin in uma_zone_slab() when slab_zalloc() fails? > > Yes, I think if you specify M_WAITOK, then that might happen. slab_zalloc() > can fail if any of the init functions fail for example, in which case it > would loop forever. You can try this hack (though it may very well be wrong) > to return failure if that is what is triggering: > > Index: uma_core.c > =================================================================== > RCS file: /usr/cvs/src/sys/vm/uma_core.c,v > retrieving revision 1.110 > diff -u -r1.110 uma_core.c > --- uma_core.c 6 Nov 2004 11:43:30 -0000 1.110 > +++ uma_core.c 19 Nov 2004 22:08:26 -0000 > @@ -1998,6 +1998,10 @@ > */ > if (flags & M_NOWAIT) > flags |= M_NOVM; > + > + /* XXXHACK */ > + if (flags & M_WAITOK) > + break; > } > return (slab); > } > > -- > John Baldwin <>< http://www.FreeBSD.org/~jhb/ > "Power Users Use the Power to Serve" = http://www.FreeBSD.org I instrumented the code with this: $ cvs diff -u cvs diff: Diffing . Index: uma_core.c =================================================================== RCS file: /home/ncvs/src/sys/vm/uma_core.c,v retrieving revision 1.110 diff -u -r1.110 uma_core.c --- uma_core.c 6 Nov 2004 11:43:30 -0000 1.110 +++ uma_core.c 6 Dec 2004 13:49:36 -0000 @@ -1926,6 +1926,7 @@ { uma_slab_t slab; uma_keg_t keg; + int i; keg = zone->uz_keg; @@ -1943,7 +1944,8 @@ slab = NULL; - for (;;) { + for (i = 0;;i++) { + KASSERT(i < 10000, ("uma_zone_slab is looping")); /* * Find a slab with some space. Prefer slabs that are partially * used over those that are totally full. This helps to reduce and now during test of Jeff Roberson's "SMP FFS" patch the assert triggered: http://www.holm.cc/stress/log/cons92.html -- Peter Holm