Date: Mon, 26 Apr 2004 17:19:45 -0400 (EDT) From: Daniel Eischen <eischen@vigrid.com> To: Julian Elischer <julian@elischer.org> Cc: Dan Nelson <dnelson@allantgroup.com> Subject: Re: panic: Exit: Single threading fouled up Message-ID: <Pine.GSO.4.10.10404261714360.10647-100000@pcnet5.pcnet.com> In-Reply-To: <Pine.BSF.4.21.0404261154250.45911-100000@InterJet.elischer.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 26 Apr 2004, Julian Elischer wrote: > > On Mon, 26 Apr 2004, Dan Nelson wrote: > > > In the last episode (Apr 26), Gavin Atkinson said: > > > I've seen this panic twice now, once on a heavily loaded UP machine > > > running gnome at the time, and once on an SMP (hyperthreaded) machine > > > which was mostly idle as it was shutting down. Both running with ULE. > > > > I've gotten it 6 times while running the pike testsuite, but not > > reliably enough that I can run a WITNESS kernel for a couple hours and > > catch it. SMP system, 4BSD scheduler, libpthread. Hangs trying to > > flush buffers so it has never generated a crashdump. The couple of > > times I was able to break into the debugger before the hang, a ps > > showed most of the processes in the system waiting for the "proctree" > > mutex. > > By chance I'm reading around that code at the moment.. > here's what is going on.. > > When a threaded process exits, all threads except the one that is > actually doing the exit() are forced to abort. > THEORETICALLY the first thread to get to this code > should run thread_single() and if another thread calls exit() it should > block looking for the proc lock until the first thread has successfully > set the "die-you-scum" flag and then proceed on and see that flag and > just commit suicide. > > actual code is (simplified): > > PROC_LOCK(p); > if (p->p_flag & P_SA || p->p_numthreads > 1) { > thread_suspend_check(0); > if (thread_single(SINGLE_EXIT)) > panic ("Exit: Single threading fouled up"); > /* > where > thread_single(SINGLE_EXIT) > is: (simplified) > > thread_single(int force_exit) > { > struct proc *p; > > td = curthread; > p = td->td_proc; > > if ((p->p_flag & P_SA) == 0 && p->p_numthreads == 1) > return (0); > > /* Is someone already single threading? */ > if (p->p_singlethread) > return (1); > > if (force_exit == SINGLE_EXIT) { > p->p_flag |= P_SINGLE_EXIT; > } else > p->p_flag &= ~P_SINGLE_EXIT; > p->p_flag |= P_STOPPED_SINGLE; > mtx_lock_spin(&sched_lock); > p->p_singlethread = td; > [...] > [set flags that should trigger thread_suspend_check()] > return (0); > } > > This means that, despite the fact that the proc lock is required, > two threads have managed to get into the same code. > > the thread_suspend_check(0); for the 2nd thread coming in should just > abort the thread and never return, so it should never proceed on to see > the p->p_singlethread already set to non-NULL. > > but it does.. hence this panic.. > > possibly something is setting it earlier > (There are other times we force single-threading) > and forgetting to unset it. > (e.g. in fork or exec) > > it is unset using the thread_single_end() call. There is a race in kse_release() and kse_wakeup() that looks to have been introduced when sleepqueues were added. msleep() drops the mutex too early (in this case the proc lock) and TDF_SINTR gets set after the mutex has been released. So the code in thread_single(): if (force_exit == SINGLE_EXIT) { if (TD_IS_SUSPENDED(td2)) { thread_unsuspend_one(td2); } -> if (TD_ON_SLEEPQ(td2) && -> (td2->td_flags & TDF_SINTR)) { sleepq_abort(td2); } may not abort the thread in the sleep queue. -- Dan Eischen
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.10.10404261714360.10647-100000>