Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 05 Oct 2004 11:32:17 -0700
From:      Julian Elischer <julian@elischer.org>
To:        Peter Holm <peter@holm.cc>
Cc:        "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   Re: scheduler (sched_4bsd) questions
Message-ID:  <4162E8B1.90803@elischer.org>
In-Reply-To: <20041005130308.GA2586@peter.osted.lan>
References:  <1095468747.31297.241.camel@palm.tree.com> <1096496057.3733.2163.camel@palm.tree.com> <1096603981.21577.195.camel@palm.tree.com> <200410041131.35387.jhb@FreeBSD.org> <1096911278.44307.17.camel@palm.tree.com> <20041004184939.GA8178@peter.osted.lan> <41619D29.1000704@elischer.org> <20041004191410.GA8423@peter.osted.lan> <4161A7BD.3040706@elischer.org> <20041005130308.GA2586@peter.osted.lan>

next in thread | previous in thread | raw e-mail | index | archive | help


Peter Holm wrote:

>On Mon, Oct 04, 2004 at 12:42:53PM -0700, Julian Elischer wrote:
>
>OK, I got a crash dump now, after a few modifications to kern_shutdown.c
>
>There are however a few strange things worth noticing:
>
>1) The are no panic string:
>
>Mounted root from ufs:/dev/ad0s1a.
>pid 1146: corrected slot count (2->1)
>[thread 100796]
>Stopped at      sched_add+0x13: movl    0x14c(%esi),%ebx
>
>2) The gdb stack trace gets a bit weird at:
>
>#8  0xc07812da in calltrap () at ../../../i386/i386/exception.s:140
>#9  0xc05f0018 in flock (td=0x0, uap=0x0) at ../../../kern/kern_descrip.c:2138
>#10 0xc0619fd1 in setrunqueue (td=0xc2319180, flags=0x0) at kern_switch.c:521
>#11 0xc061921f in sched_wakeup (td=0xc2319180) at ../../../kern/sched_4bsd.c:859
>
>Where did flock() come from?
>

probably just a partially initialised frame.. ddb seems to have a good 
trace, starting at setrunqueue().

there are two things to notice..
firstly the "corrected slot count (2->1)" messge is still there. (grumble).

this is hapenning when a threade dprocess moves back to be ing an 
unthreaded preocess.
for some reason, the number of openning s is not being set back to 1 but 
rather to 2.
I believe it is because while in thhe threaded mode it is already too 
high by some amount
(sometimes equivalent to NTHREAD) but I can not see why.
Hopefully it is not a fatal problem (as it would be if it were too LOW, 
but I hope to figure it
out soon (maybe another one for Stephan :-)

On the topic of the crash. the ktr shows no unexpected activity in the 
time before the crash.... no preemption,
or similar..
it might be possible that there was an interrupt, but there is nothing 
htath the ktr mask used shows..
maybe you could compile in and use a few more bits in the ktr masks to 
show process events and interrupts

In the absence of unexpected happennings we must assume the kseg runq is 
in an odd state before
it gets used in setrunqueue, leading to the panic..

I think I will check in some debug and cleanup stuff I have here..
maybe it will shake out something..

>
>The full console output is at http://www.holm.cc/stress/log/cons82.html
>
>- Peter
>
>  
>
>>ok, then  if it happens again,  from ddb, run
>>show ktr
>>after you've done the 'ps' and go back a couple of hundred events..
>>
>>thanks.
>>
>>
>>Peter Holm wrote:
>>
>>    
>>
>>>On Mon, Oct 04, 2004 at 11:57:45AM -0700, Julian Elischer wrote:
>>>
>>>
>>>      
>>>
>>>>can you run ktrdump against teh corefile and get the ktr output?
>>>>(you do have it enabled right?)
>>>>
>>>>  
>>>>
>>>>        
>>>>
>>>No, that's one of the problems: doadump() fails with this specific panic.
>>>
>>>- Peter
>>>
>>>
>>>
>>>      
>>>
>>>>Peter Holm wrote:
>>>>
>>>>  
>>>>
>>>>        
>>>>
>>>>>On Mon, Oct 04, 2004 at 01:34:38PM -0400, Stephan Uphoff wrote:
>>>>>
>>>>>
>>>>>    
>>>>>
>>>>>          
>>>>>
>>>>>>On Mon, 2004-10-04 at 11:31, John Baldwin wrote:
>>>>>>
>>>>>>
>>>>>>      
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>On Friday 01 October 2004 12:13 am, Stephan Uphoff wrote:
>>>>>>>  
>>>>>>>
>>>>>>>        
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>On Wed, 2004-09-29 at 18:14, Stephan Uphoff wrote:
>>>>>>>>    
>>>>>>>>
>>>>>>>>          
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>>>I was looking at the MUTEX_WAKE_ALL undefined case when I used the
>>>>>>>>>critical section for turnstile_claim().
>>>>>>>>>However there are bigger problems with MUTEX_WAKE_ALL undefined
>>>>>>>>>so you are right - the critical section for turnstile_claim is pretty
>>>>>>>>>useless.
>>>>>>>>>      
>>>>>>>>>
>>>>>>>>>            
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>Arghhh !!!
>>>>>>>>
>>>>>>>>MUTEX_WAKE_ALL is NOT an option in GENERIC.
>>>>>>>>I recall verifying that it is defined twice. Guess I must have looked 
>>>>>>>>at
>>>>>>>>the wrong source tree :-(
>>>>>>>>This means yes - we have bigger problems!
>>>>>>>>
>>>>>>>>Example:
>>>>>>>>
>>>>>>>>Thread A holds a mutex x contested by Thread B and C and has priority
>>>>>>>>pri(A).
>>>>>>>>
>>>>>>>>Thread C holds a mutex y and pri(B) < pri(C)
>>>>>>>>
>>>>>>>>Thread A releases the lock wakes thread B but lets C on the turnstile
>>>>>>>>wait queue.
>>>>>>>>
>>>>>>>>An interrupt thread I tries to lock mutex y owned by C.
>>>>>>>>
>>>>>>>>However priority inheritance does not work since B needs to run first 
>>>>>>>>to
>>>>>>>>take ownership of the lock.
>>>>>>>>
>>>>>>>>I is blocked :-(
>>>>>>>>    
>>>>>>>>
>>>>>>>>          
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>Ermm, if the interrupt happens after x is released then I's priority 
>>>>>>>should propagate from I to C to B.  
>>>>>>>  
>>>>>>>
>>>>>>>        
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>There is a hole after the mutex x is released by A - but before B can
>>>>>>claim the mutex. The turnstile for mutex x is unowned and interrupt
>>>>>>thread I when trying to donate its priority will run into:
>>>>>>
>>>>>>	if (td == NULL) {
>>>>>>			/*
>>>>>>			 * This really isn't quite right. Really
>>>>>>			 * ought to bump priority of thread that
>>>>>>			 * next acquires the lock.
>>>>>>			 */
>>>>>>			return;
>>>>>>		}
>>>>>>
>>>>>>So B needs to run and acquire the mutex before priority inheritance
>>>>>>works again and does not get a priority boost to do so. 
>>>>>>
>>>>>>This is easy to fix and MUTEX_WAKE_ALL can be removed again at that time
>>>>>>- but my time budget is limited and Peter has an interesting bug left
>>>>>>that has priority.
>>>>>>
>>>>>>
>>>>>>      
>>>>>>
>>>>>>            
>>>>>>
>>>>>I'm not closer to being able to create this panic in a controlled way.
>>>>>After a whole day of different tests I finally got this panic:
>>>>>http://www.holm.cc/stress/log/cons81.html. The trigger seems to be one
>>>>>particular Java applet, but it is not easily reproduceable.
>>>>>
>>>>>- Peter
>>>>>
>>>>>
>>>>>
>>>>>    
>>>>>
>>>>>          
>>>>>
>>>>>>>If the interrupt happens before x is released, 
>>>>>>>then the final bit of propagate_priority() should handle it since it 
>>>>>>>resorts the turnstile's thread queue so that C will be awakened rather 
>>>>>>>than B.
>>>>>>>  
>>>>>>>
>>>>>>>        
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>Agreed.
>>>>>>
>>>>>>	Stephan
>>>>>>
>>>>>>
>>>>>>      
>>>>>>
>>>>>>            
>>>>>>
>>>>>_______________________________________________
>>>>>freebsd-arch@freebsd.org mailing list
>>>>>http://lists.freebsd.org/mailman/listinfo/freebsd-arch
>>>>>To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
>>>>>
>>>>>
>>>>>    
>>>>>
>>>>>          
>>>>>
>>>
>>>      
>>>
>
>  
>
>------------------------------------------------------------------------
>
>Index: kern_shutdown.c
>===================================================================
>RCS file: /home/ncvs/src/sys/kern/kern_shutdown.c,v
>retrieving revision 1.166
>diff -u -r1.166 kern_shutdown.c
>--- kern_shutdown.c	2 Sep 2004 18:59:15 -0000	1.166
>+++ kern_shutdown.c	5 Oct 2004 12:23:45 -0000
>@@ -230,10 +230,14 @@
> 		return;
> 	}
> 
>+	if (panicstr == NULL)
>+		panicstr = "In doadump()";	/* Major hack XXX pho */
> 	savectx(&dumppcb);
> 	dumptid = curthread->td_tid;
> 	dumping++;
> 	dumpsys(&dumper);
>+	if (!strcmp(panicstr, "In doadump()"))
>+		panicstr = NULL;	/* Major hack XXX pho */
> }
> 
> /*
>@@ -519,6 +523,8 @@
> #endif
> 
> #ifdef KDB
>+	if (panicstr == NULL)
>+		panicstr = "(NULL)";	/* XXX pho */
> 	if (newpanic && trace_on_panic)
> 		kdb_backtrace();
> 	if (debugger_on_panic)
>  
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4162E8B1.90803>