Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 Sep 2004 07:47:29 -0400 (EDT)
From:      Andrew Gallatin <gallatin@cs.duke.edu>
To:        Julian Elischer <julian@elischer.org>
Cc:        freebsd-threads@freebsd.org
Subject:   Re: Unkillable KSE threaded proc
Message-ID:  <16714.52945.827195.748164@grasshopper.cs.duke.edu>
In-Reply-To: <414A6ACD.2020600@elischer.org>
References:  <16703.11479.679335.588170@grasshopper.cs.duke.edu> <414942B3.1060703@elischer.org> <16713.38977.864343.415015@grasshopper.cs.duke.edu> <200409161316.43010.jhb@FreeBSD.org> <414A6ACD.2020600@elischer.org>

next in thread | previous in thread | raw e-mail | index | archive | help

Julian Elischer writes:
 > John Baldwin wrote:
 > > On Thursday 16 September 2004 09:42 am, Andrew Gallatin wrote:
 > > 
 > >>Julian Elischer writes:
 > >> > Andrew, please try -current on ts own now..
 > >> > I have checked in some fixes that have helped others.
 > >>
 > >>OK, preemption off... Still a system lockup, but a little different.
 > >>
 > >>The interesting thing here is that continuing and breaking into the
 > >>debugger repeatedly seems to show that thread 0xc1646af0 is looping in
 > >>exit.  I've seen him in thread_single, thread_suspend_check, and in
 > >>exit itself at kern_exit.c:163, etc.  A breakpoint in
 > >>thread_suspend_one never triggers, so I guess he's holding the proc
 > >>lock and just looping forever.  A breakpoint in _mtx_assert() shows
 > >>him asserting the proc lock in thread_suspend_check at kern_thread.c:898.
 > >>Over and over.
 > > 
 > > 
 > > There is definitely some sort of infinite loop here.  Stripping out the 
 > > comments in exit1() for that section of code reveals basically:
 > > 
 > >         PROC_LOCK(p);
 > >         if (p->p_flag & P_HADTHREADS) {
 > > retry:
 > >                 thread_suspend_check(0);
 > >                 if (thread_single(SINGLE_EXIT))
 > >                         goto retry;
 > > 	}
 > >         p->p_flag |= P_WEXIT;
 > >         PROC_UNLOCK(p);
 > > 
 > > So it's easy to see how it can stuck in a loop I think.  If thread_single() 
 > > never drops the lock then other threads that are waiting to die can't 
 > > actually wait because they can never get the proc lock so that they can die.
 > > 
 > 
 > 
 > hmm intersting..
 > but this code hasn't changed in ages...
 > 
 > 
 > in thread_single we see:
 > 
 >                  thread_suspend_one(td);
 >                  PROC_UNLOCK(p);
 >                  mi_switch(SW_VOL, NULL);
 >                  mtx_unlock_spin(&sched_lock);
 >                  PROC_LOCK(p);
 >                  mtx_lock_spin(&sched_lock);
 > 
 > so when it sleeps it releases the proc lock.

But that's the problem.  As I said above, break in thread_suspend_one
never triggers, so this code is never called.  It must be bailing
out of thread_suspend_one() before this happens.

Did somebody fix ddb?  If yes, I can try stepping through it if you like.

Maybe a quick fix would be to drop the proc lock and tsleep for a
clock tick at the bottom of the infinate loop...

Drew



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?16714.52945.827195.748164>