From owner-freebsd-arch@FreeBSD.ORG Tue Oct 5 13:03:18 2004 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DB1BC16A4D3 for ; Tue, 5 Oct 2004 13:03:18 +0000 (GMT) Received: from relay.pair.com (relay.pair.com [209.68.1.20]) by mx1.FreeBSD.org (Postfix) with SMTP id DBA8143D46 for ; Tue, 5 Oct 2004 13:03:13 +0000 (GMT) (envelope-from pho@holm.cc) Received: (qmail 2737 invoked from network); 5 Oct 2004 13:03:11 -0000 Received: from 0x50a43fc7.hknxx1.adsl-dhcp.tele.dk (HELO peter.osted.lan) (80.164.63.199) by relay.pair.com with SMTP; 5 Oct 2004 13:03:11 -0000 X-pair-Authenticated: 80.164.63.199 Received: from peter.osted.lan (localhost.osted.lan [127.0.0.1]) by peter.osted.lan (8.12.10/8.12.10) with ESMTP id i95D3AXc002744; Tue, 5 Oct 2004 15:03:10 +0200 (CEST) (envelope-from pho@peter.osted.lan) Received: (from pho@localhost) by peter.osted.lan (8.12.10/8.12.10/Submit) id i95D39sw002743; Tue, 5 Oct 2004 15:03:09 +0200 (CEST) (envelope-from pho) Date: Tue, 5 Oct 2004 15:03:08 +0200 From: Peter Holm To: Julian Elischer Message-ID: <20041005130308.GA2586@peter.osted.lan> References: <1095468747.31297.241.camel@palm.tree.com> <1096496057.3733.2163.camel@palm.tree.com> <1096603981.21577.195.camel@palm.tree.com> <200410041131.35387.jhb@FreeBSD.org> <1096911278.44307.17.camel@palm.tree.com> <20041004184939.GA8178@peter.osted.lan> <41619D29.1000704@elischer.org> <20041004191410.GA8423@peter.osted.lan> <4161A7BD.3040706@elischer.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="NzB8fVQJ5HfG6fxh" Content-Disposition: inline In-Reply-To: <4161A7BD.3040706@elischer.org> User-Agent: Mutt/1.4.1i cc: Peter Holm cc: Stephan Uphoff cc: "freebsd-arch@freebsd.org" Subject: Re: scheduler (sched_4bsd) questions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Oct 2004 13:03:19 -0000 --NzB8fVQJ5HfG6fxh Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Mon, Oct 04, 2004 at 12:42:53PM -0700, Julian Elischer wrote: OK, I got a crash dump now, after a few modifications to kern_shutdown.c There are however a few strange things worth noticing: 1) The are no panic string: Mounted root from ufs:/dev/ad0s1a. pid 1146: corrected slot count (2->1) [thread 100796] Stopped at sched_add+0x13: movl 0x14c(%esi),%ebx 2) The gdb stack trace gets a bit weird at: #8 0xc07812da in calltrap () at ../../../i386/i386/exception.s:140 #9 0xc05f0018 in flock (td=0x0, uap=0x0) at ../../../kern/kern_descrip.c:2138 #10 0xc0619fd1 in setrunqueue (td=0xc2319180, flags=0x0) at kern_switch.c:521 #11 0xc061921f in sched_wakeup (td=0xc2319180) at ../../../kern/sched_4bsd.c:859 Where did flock() come from? The full console output is at http://www.holm.cc/stress/log/cons82.html - Peter > ok, then if it happens again, from ddb, run > show ktr > after you've done the 'ps' and go back a couple of hundred events.. > > thanks. > > > Peter Holm wrote: > > >On Mon, Oct 04, 2004 at 11:57:45AM -0700, Julian Elischer wrote: > > > > > >>can you run ktrdump against teh corefile and get the ktr output? > >>(you do have it enabled right?) > >> > >> > >> > > > >No, that's one of the problems: doadump() fails with this specific panic. > > > >- Peter > > > > > > > >>Peter Holm wrote: > >> > >> > >> > >>>On Mon, Oct 04, 2004 at 01:34:38PM -0400, Stephan Uphoff wrote: > >>> > >>> > >>> > >>> > >>>>On Mon, 2004-10-04 at 11:31, John Baldwin wrote: > >>>> > >>>> > >>>> > >>>> > >>>>>On Friday 01 October 2004 12:13 am, Stephan Uphoff wrote: > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>>On Wed, 2004-09-29 at 18:14, Stephan Uphoff wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>I was looking at the MUTEX_WAKE_ALL undefined case when I used the > >>>>>>>critical section for turnstile_claim(). > >>>>>>>However there are bigger problems with MUTEX_WAKE_ALL undefined > >>>>>>>so you are right - the critical section for turnstile_claim is pretty > >>>>>>>useless. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>Arghhh !!! > >>>>>> > >>>>>>MUTEX_WAKE_ALL is NOT an option in GENERIC. > >>>>>>I recall verifying that it is defined twice. Guess I must have looked > >>>>>>at > >>>>>>the wrong source tree :-( > >>>>>>This means yes - we have bigger problems! > >>>>>> > >>>>>>Example: > >>>>>> > >>>>>>Thread A holds a mutex x contested by Thread B and C and has priority > >>>>>>pri(A). > >>>>>> > >>>>>>Thread C holds a mutex y and pri(B) < pri(C) > >>>>>> > >>>>>>Thread A releases the lock wakes thread B but lets C on the turnstile > >>>>>>wait queue. > >>>>>> > >>>>>>An interrupt thread I tries to lock mutex y owned by C. > >>>>>> > >>>>>>However priority inheritance does not work since B needs to run first > >>>>>>to > >>>>>>take ownership of the lock. > >>>>>> > >>>>>>I is blocked :-( > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>Ermm, if the interrupt happens after x is released then I's priority > >>>>>should propagate from I to C to B. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>There is a hole after the mutex x is released by A - but before B can > >>>>claim the mutex. The turnstile for mutex x is unowned and interrupt > >>>>thread I when trying to donate its priority will run into: > >>>> > >>>> if (td == NULL) { > >>>> /* > >>>> * This really isn't quite right. Really > >>>> * ought to bump priority of thread that > >>>> * next acquires the lock. > >>>> */ > >>>> return; > >>>> } > >>>> > >>>>So B needs to run and acquire the mutex before priority inheritance > >>>>works again and does not get a priority boost to do so. > >>>> > >>>>This is easy to fix and MUTEX_WAKE_ALL can be removed again at that time > >>>>- but my time budget is limited and Peter has an interesting bug left > >>>>that has priority. > >>>> > >>>> > >>>> > >>>> > >>>I'm not closer to being able to create this panic in a controlled way. > >>>After a whole day of different tests I finally got this panic: > >>>http://www.holm.cc/stress/log/cons81.html. The trigger seems to be one > >>>particular Java applet, but it is not easily reproduceable. > >>> > >>>- Peter > >>> > >>> > >>> > >>> > >>> > >>>>>If the interrupt happens before x is released, > >>>>>then the final bit of propagate_priority() should handle it since it > >>>>>resorts the turnstile's thread queue so that C will be awakened rather > >>>>>than B. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>Agreed. > >>>> > >>>> Stephan > >>>> > >>>> > >>>> > >>>> > >>>_______________________________________________ > >>>freebsd-arch@freebsd.org mailing list > >>>http://lists.freebsd.org/mailman/listinfo/freebsd-arch > >>>To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > >>> > >>> > >>> > >>> > > > > > > -- Peter Holm --NzB8fVQJ5HfG6fxh Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="kern_shutdown.diff" Index: kern_shutdown.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_shutdown.c,v retrieving revision 1.166 diff -u -r1.166 kern_shutdown.c --- kern_shutdown.c 2 Sep 2004 18:59:15 -0000 1.166 +++ kern_shutdown.c 5 Oct 2004 12:23:45 -0000 @@ -230,10 +230,14 @@ return; } + if (panicstr == NULL) + panicstr = "In doadump()"; /* Major hack XXX pho */ savectx(&dumppcb); dumptid = curthread->td_tid; dumping++; dumpsys(&dumper); + if (!strcmp(panicstr, "In doadump()")) + panicstr = NULL; /* Major hack XXX pho */ } /* @@ -519,6 +523,8 @@ #endif #ifdef KDB + if (panicstr == NULL) + panicstr = "(NULL)"; /* XXX pho */ if (newpanic && trace_on_panic) kdb_backtrace(); if (debugger_on_panic) --NzB8fVQJ5HfG6fxh--