Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Dec 2004 15:21:44 -0500
From:      John Baldwin <jhb@FreeBSD.org>
To:        Peter Holm <peter@holm.cc>
Cc:        jeffr@FreeBSD.org
Subject:   Re: Freeze
Message-ID:  <200412161521.44026.jhb@FreeBSD.org>
In-Reply-To: <20041206135934.GA24238@peter.osted.lan>
References:  <20041112123343.GA12048@peter.osted.lan> <200411191710.19215.jhb@FreeBSD.org> <20041206135934.GA24238@peter.osted.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Monday 06 December 2004 08:59 am, Peter Holm wrote:
> On Fri, Nov 19, 2004 at 05:10:19PM -0500, John Baldwin wrote:
> > On Friday 19 November 2004 02:59 am, Peter Holm wrote:
> > > On Mon, Nov 15, 2004 at 03:46:15PM -0500, John Baldwin wrote:
> > > > On Friday 12 November 2004 07:33 am, Peter Holm wrote:
> > > > > GENERIC HEAD from Nov 11 08:05 UTC
> > > > >
> > > > > The following stack traces etc. was done before my first
> > > > > cup of coffee, so it's not so informative as it could have been :-(
> > > > >
> > > > > The test box appeared to have been frozen for more than 6 hours,
> > > > > but was pingable.
> > > > >
> > > > > http://www.holm.cc/stress/log/cons86.html
> > > >
> > > > A weak guess is that you have the system in some sort of livelock due
> > > > to fork()?  Have you tried running with 'debug.mpsafevm=1' set from
> > > > the loader?
> > > >
> > > > --
> > > > John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
> > > > "Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
> > >
> > > OK, I've got some more info:
> > >
> > > http://www.holm.cc/stress/log/cons88.html
> > >
> > > Looks like a spin in uma_zone_slab() when slab_zalloc() fails?
> >
> > Yes, I think if you specify M_WAITOK, then that might happen. 
> > slab_zalloc() can fail if any of the init functions fail for example, in
> > which case it would loop forever.  You can try this hack (though it may
> > very well be wrong) to return failure if that is what is triggering:
> >
> > Index: uma_core.c
> > ===================================================================
> > RCS file: /usr/cvs/src/sys/vm/uma_core.c,v
> > retrieving revision 1.110
> > diff -u -r1.110 uma_core.c
> > --- uma_core.c	6 Nov 2004 11:43:30 -0000	1.110
> > +++ uma_core.c	19 Nov 2004 22:08:26 -0000
> > @@ -1998,6 +1998,10 @@
> >  		 */
> >  		if (flags & M_NOWAIT)
> >  			flags |= M_NOVM;
> > +
> > +		/* XXXHACK */
> > +		if (flags & M_WAITOK)
> > +			break;
> >  	}
> >  	return (slab);
> >  }
> >
> > --
> > John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
> > "Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
>
> I instrumented the code with this:
> $ cvs diff -u
> cvs diff: Diffing .
> Index: uma_core.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/vm/uma_core.c,v
> retrieving revision 1.110
> diff -u -r1.110 uma_core.c
> --- uma_core.c  6 Nov 2004 11:43:30 -0000       1.110
> +++ uma_core.c  6 Dec 2004 13:49:36 -0000
> @@ -1926,6 +1926,7 @@
>  {
>         uma_slab_t slab;
>         uma_keg_t keg;
> +       int i;
>
>         keg = zone->uz_keg;
>
> @@ -1943,7 +1944,8 @@
>
>         slab = NULL;
>
> -       for (;;) {
> +       for (i = 0;;i++) {
> +               KASSERT(i < 10000, ("uma_zone_slab is looping"));
>                 /*
>                  * Find a slab with some space.  Prefer slabs that are
> partially * used over those that are totally full.  This helps to reduce
>
> and now during test of Jeff Roberson's "SMP FFS" patch the assert
> triggered: http://www.holm.cc/stress/log/cons92.html

Hmm.  Does the hack patch above make the hang go away or does it just break 
things worse?

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200412161521.44026.jhb>