From owner-freebsd-current@FreeBSD.ORG Tue Dec 28 04:33:32 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D289E16A4CE; Tue, 28 Dec 2004 04:33:32 +0000 (GMT) Received: from stephanie.unixdaemons.com (stephanie.unixdaemons.com [67.18.111.194]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5041043D39; Tue, 28 Dec 2004 04:33:32 +0000 (GMT) (envelope-from bmilekic@technokratis.com) Received: from stephanie.unixdaemons.com (bmilekic@localhost.unixdaemons.com [127.0.0.1])iBS4XRbF097507; Mon, 27 Dec 2004 23:33:27 -0500 (EST) Received: (from bmilekic@localhost) by stephanie.unixdaemons.com (8.13.2/8.12.1/Submit) id iBS4XRWG097506; Mon, 27 Dec 2004 23:33:27 -0500 (EST) (envelope-from bmilekic@technokratis.com) X-Authentication-Warning: stephanie.unixdaemons.com: bmilekic set sender to bmilekic@technokratis.com using -f Date: Mon, 27 Dec 2004 23:33:27 -0500 From: Bosko Milekic To: John Baldwin Message-ID: <20041228043327.GA96744@technokratis.com> References: <20041112123343.GA12048@peter.osted.lan> <200412161521.44026.jhb@FreeBSD.org> <20041220110411.GA87750@peter.osted.lan> <200412271705.31625.jhb@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200412271705.31625.jhb@FreeBSD.org> User-Agent: Mutt/1.4.2.1i cc: jroberson@chesapeake.net cc: jeffr@freebsd.org cc: freebsd-current@freebsd.org Subject: Re: Freeze X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Dec 2004 04:33:33 -0000 http://docs.freebsd.org/cgi/getmsg.cgi?fetch=159278+0+current/freebsd-current (and see previous in thread for context). Let's hope. -Bosko On Mon, Dec 27, 2004 at 05:05:31PM -0500, John Baldwin wrote: > On Monday 20 December 2004 06:04 am, Peter Holm wrote: > > On Thu, Dec 16, 2004 at 03:21:44PM -0500, John Baldwin wrote: > > > On Monday 06 December 2004 08:59 am, Peter Holm wrote: > > > > On Fri, Nov 19, 2004 at 05:10:19PM -0500, John Baldwin wrote: > > > > > On Friday 19 November 2004 02:59 am, Peter Holm wrote: > > > > > > On Mon, Nov 15, 2004 at 03:46:15PM -0500, John Baldwin wrote: > > > > > > > On Friday 12 November 2004 07:33 am, Peter Holm wrote: > > > > > > > > GENERIC HEAD from Nov 11 08:05 UTC > > > > > > > > > > > > > > > > The following stack traces etc. was done before my first > > > > > > > > cup of coffee, so it's not so informative as it could have been > > > > > > > > :-( > > > > > > > > > > > > > > > > The test box appeared to have been frozen for more than 6 > > > > > > > > hours, but was pingable. > > > > > > > > > > > > > > > > http://www.holm.cc/stress/log/cons86.html > > > > > > > > > > > > > > A weak guess is that you have the system in some sort of livelock > > > > > > > due to fork()? Have you tried running with 'debug.mpsafevm=1' > > > > > > > set from the loader? > > > > > > > > > > > > > > -- > > > > > > > John Baldwin <>< http://www.FreeBSD.org/~jhb/ > > > > > > > "Power Users Use the Power to Serve" = http://www.FreeBSD.org > > > > > > > > > > > > OK, I've got some more info: > > > > > > > > > > > > http://www.holm.cc/stress/log/cons88.html > > > > > > > > > > > > Looks like a spin in uma_zone_slab() when slab_zalloc() fails? > > > > > > > > > > Yes, I think if you specify M_WAITOK, then that might happen. > > > > > slab_zalloc() can fail if any of the init functions fail for example, > > > > > in which case it would loop forever. You can try this hack (though > > > > > it may very well be wrong) to return failure if that is what is > > > > > triggering: > > > > > > > > > > Index: uma_core.c > > > > > =================================================================== > > > > > RCS file: /usr/cvs/src/sys/vm/uma_core.c,v > > > > > retrieving revision 1.110 > > > > > diff -u -r1.110 uma_core.c > > > > > --- uma_core.c 6 Nov 2004 11:43:30 -0000 1.110 > > > > > +++ uma_core.c 19 Nov 2004 22:08:26 -0000 > > > > > @@ -1998,6 +1998,10 @@ > > > > > */ > > > > > if (flags & M_NOWAIT) > > > > > flags |= M_NOVM; > > > > > + > > > > > + /* XXXHACK */ > > > > > + if (flags & M_WAITOK) > > > > > + break; > > > > > } > > > > > return (slab); > > > > > } > > > > > > > > > > -- > > > > > John Baldwin <>< http://www.FreeBSD.org/~jhb/ > > > > > "Power Users Use the Power to Serve" = http://www.FreeBSD.org > > > > > > > > I instrumented the code with this: > > > > $ cvs diff -u > > > > cvs diff: Diffing . > > > > Index: uma_core.c > > > > =================================================================== > > > > RCS file: /home/ncvs/src/sys/vm/uma_core.c,v > > > > retrieving revision 1.110 > > > > diff -u -r1.110 uma_core.c > > > > --- uma_core.c 6 Nov 2004 11:43:30 -0000 1.110 > > > > +++ uma_core.c 6 Dec 2004 13:49:36 -0000 > > > > @@ -1926,6 +1926,7 @@ > > > > { > > > > uma_slab_t slab; > > > > uma_keg_t keg; > > > > + int i; > > > > > > > > keg = zone->uz_keg; > > > > > > > > @@ -1943,7 +1944,8 @@ > > > > > > > > slab = NULL; > > > > > > > > - for (;;) { > > > > + for (i = 0;;i++) { > > > > + KASSERT(i < 10000, ("uma_zone_slab is looping")); > > > > /* > > > > * Find a slab with some space. Prefer slabs that are > > > > partially * used over those that are totally full. This helps to > > > > reduce > > > > > > > > and now during test of Jeff Roberson's "SMP FFS" patch the assert > > > > triggered: http://www.holm.cc/stress/log/cons92.html > > > > > > Hmm. Does the hack patch above make the hang go away or does it just > > > break things worse? > > > > I have been testing your patch for quite a while. If it's OK for > > m_getcl with M_TRYWAIT to return NULL, your patch reviled a missing > > test for NULL in kern/uipc_socket.c:750 > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 0; apic id = 00 > > fault virtual address = 0x1c > > fault code = supervisor write, page not present > > instruction pointer = 0x8:0xc0647d77 > > stack pointer = 0x10:0xcfa9cbf0 > > frame pointer = 0x10:0xcfa9cc38 > > code segment = base 0x0, limit 0xfffff, type 0x1b > > = DPL 0, pres 1, def32 1, gran 1 > > processor eflags = interrupt enabled, resume, IOPL = 0 > > current process = 67417 (net) > > [thread pid 67417 tid 100890 ] > > Stopped at sosend+0x227: movl $0,0x1c(%eax) > > db> where > > Tracing pid 67417 tid 100890 td 0xc1ae8000 > > sosend(c3454dec,0,cfa9cc90,0,0) at sosend+0x227 > > soo_write(c1db9374,cfa9cc90,c1aa6180,0,c1ae8000) at > > soo_write+0x2d > > dofilewrite(3,bfbfe740,400,ffffffff,ffffffff) at dofilewrite+0x99 > > write(c1ae8000,cfa9cd14,3,d,246) at write+0x48 > > syscall(2f,bfbf002f,bfbf002f,3,bfbfe740) at syscall+0x128 > > Xint0x80_syscall() at Xint0x80_syscall+0x1f > > --- syscall (4, FreeBSD ELF32, write), eip = 0x280bfbf7, esp = > > 0xbfbfe71c, ebp = 0xbfbfeb68 --- > > Hmm, it looks like M_TRYWAIT isn't allowed to return NULL. Unfortunately, I > think my hack basically lets UMA return NULL instead of spinning forever, so > it's not that useful. I'm not sure how to really fix this problem in UMA. > > -- > John Baldwin <>< http://www.FreeBSD.org/~jhb/ > "Power Users Use the Power to Serve" = http://www.FreeBSD.org > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" -- Bosko Milekic bmilekic@technokratis.com bmilekic@FreeBSD.org