Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 26 Apr 2004 17:19:45 -0400 (EDT)
From:      Daniel Eischen <eischen@vigrid.com>
To:        Julian Elischer <julian@elischer.org>
Cc:        Dan Nelson <dnelson@allantgroup.com>
Subject:   Re: panic: Exit: Single threading fouled up
Message-ID:  <Pine.GSO.4.10.10404261714360.10647-100000@pcnet5.pcnet.com>
In-Reply-To: <Pine.BSF.4.21.0404261154250.45911-100000@InterJet.elischer.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 26 Apr 2004, Julian Elischer wrote:
> 
> On Mon, 26 Apr 2004, Dan Nelson wrote:
> 
> > In the last episode (Apr 26), Gavin Atkinson said:
> > > I've seen this panic twice now, once on a heavily loaded UP machine
> > > running gnome at the time, and once on an SMP (hyperthreaded) machine
> > > which was mostly idle as it was shutting down. Both running with ULE.
> > 
> > I've gotten it 6 times while running the pike testsuite, but not
> > reliably enough that I can run a WITNESS kernel for a couple hours and
> > catch it.  SMP system, 4BSD scheduler, libpthread.  Hangs trying to
> > flush buffers so it has never generated a crashdump.  The couple of
> > times I was able to break into the debugger before the hang, a ps
> > showed most of the processes in the system waiting for the "proctree"
> > mutex.
> 
> By chance I'm reading around that code at the moment..
> here's what is going on..
> 
> When a threaded process exits, all threads except the one that is
> actually doing the exit() are forced to abort.
> THEORETICALLY the first thread to get to this code
> should run thread_single() and if another thread calls exit() it should
> block looking for the proc lock until the first thread has successfully
> set the "die-you-scum" flag and then proceed on and see that flag and
> just commit suicide.
> 
> actual code is (simplified):
> 
>         PROC_LOCK(p);
>         if (p->p_flag & P_SA || p->p_numthreads > 1) {
>                 thread_suspend_check(0);
>                 if (thread_single(SINGLE_EXIT))
>                         panic ("Exit: Single threading fouled up");
>                 /*
> where
> thread_single(SINGLE_EXIT)
> is: (simplified)
> 
> thread_single(int force_exit)
> {
>         struct proc *p;
> 
>         td = curthread;
>         p = td->td_proc;
>           
>         if ((p->p_flag & P_SA) == 0 && p->p_numthreads == 1)
>                 return (0);
> 
>         /* Is someone already single threading? */
>         if (p->p_singlethread)
>                 return (1);
> 
>         if (force_exit == SINGLE_EXIT) {
>                 p->p_flag |= P_SINGLE_EXIT;
>         } else
>                 p->p_flag &= ~P_SINGLE_EXIT;
>         p->p_flag |= P_STOPPED_SINGLE;
>         mtx_lock_spin(&sched_lock);
>         p->p_singlethread = td;
> [...]
> 	[set flags that should trigger thread_suspend_check()]
> 	return (0);
> }
> 
> This means that, despite the fact that the proc lock is required,
> two threads have managed to get into the same code.
> 
> the thread_suspend_check(0); for the 2nd thread coming in should just
> abort the thread and never return, so it should never proceed on to see
> the p->p_singlethread already set to non-NULL.
> 
> but it does.. hence this panic..
> 
> possibly something is setting it earlier 
> (There are other times we force single-threading)
> and forgetting to unset it.
> (e.g. in fork or exec) 
> 
> it is unset using the thread_single_end() call.

There is a race in kse_release() and kse_wakeup() that looks
to have been introduced when sleepqueues were added.  msleep()
drops the mutex too early (in this case the proc lock) and
TDF_SINTR gets set after the mutex has been released.  So
the code in thread_single():

				if (force_exit == SINGLE_EXIT) {
					if (TD_IS_SUSPENDED(td2)) {
						thread_unsuspend_one(td2);
					}
->					if (TD_ON_SLEEPQ(td2) &&
->					    (td2->td_flags & TDF_SINTR)) {
						sleepq_abort(td2);
					}

may not abort the thread in the sleep queue.

-- 
Dan Eischen



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.10.10404261714360.10647-100000>