From owner-freebsd-current@FreeBSD.ORG Mon Apr 26 14:20:01 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 186B116A4CF for ; Mon, 26 Apr 2004 14:20:01 -0700 (PDT) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 96C0D43D45 for ; Mon, 26 Apr 2004 14:20:00 -0700 (PDT) (envelope-from eischen@vigrid.com) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mail.pcnet.com (8.12.10/8.12.1) with ESMTP id i3QLJjQk012612; Mon, 26 Apr 2004 17:19:45 -0400 (EDT) Date: Mon, 26 Apr 2004 17:19:45 -0400 (EDT) From: Daniel Eischen X-Sender: eischen@pcnet5.pcnet.com To: Julian Elischer In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-current@freebsd.org cc: Dan Nelson Subject: Re: panic: Exit: Single threading fouled up X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Apr 2004 21:20:01 -0000 On Mon, 26 Apr 2004, Julian Elischer wrote: > > On Mon, 26 Apr 2004, Dan Nelson wrote: > > > In the last episode (Apr 26), Gavin Atkinson said: > > > I've seen this panic twice now, once on a heavily loaded UP machine > > > running gnome at the time, and once on an SMP (hyperthreaded) machine > > > which was mostly idle as it was shutting down. Both running with ULE. > > > > I've gotten it 6 times while running the pike testsuite, but not > > reliably enough that I can run a WITNESS kernel for a couple hours and > > catch it. SMP system, 4BSD scheduler, libpthread. Hangs trying to > > flush buffers so it has never generated a crashdump. The couple of > > times I was able to break into the debugger before the hang, a ps > > showed most of the processes in the system waiting for the "proctree" > > mutex. > > By chance I'm reading around that code at the moment.. > here's what is going on.. > > When a threaded process exits, all threads except the one that is > actually doing the exit() are forced to abort. > THEORETICALLY the first thread to get to this code > should run thread_single() and if another thread calls exit() it should > block looking for the proc lock until the first thread has successfully > set the "die-you-scum" flag and then proceed on and see that flag and > just commit suicide. > > actual code is (simplified): > > PROC_LOCK(p); > if (p->p_flag & P_SA || p->p_numthreads > 1) { > thread_suspend_check(0); > if (thread_single(SINGLE_EXIT)) > panic ("Exit: Single threading fouled up"); > /* > where > thread_single(SINGLE_EXIT) > is: (simplified) > > thread_single(int force_exit) > { > struct proc *p; > > td = curthread; > p = td->td_proc; > > if ((p->p_flag & P_SA) == 0 && p->p_numthreads == 1) > return (0); > > /* Is someone already single threading? */ > if (p->p_singlethread) > return (1); > > if (force_exit == SINGLE_EXIT) { > p->p_flag |= P_SINGLE_EXIT; > } else > p->p_flag &= ~P_SINGLE_EXIT; > p->p_flag |= P_STOPPED_SINGLE; > mtx_lock_spin(&sched_lock); > p->p_singlethread = td; > [...] > [set flags that should trigger thread_suspend_check()] > return (0); > } > > This means that, despite the fact that the proc lock is required, > two threads have managed to get into the same code. > > the thread_suspend_check(0); for the 2nd thread coming in should just > abort the thread and never return, so it should never proceed on to see > the p->p_singlethread already set to non-NULL. > > but it does.. hence this panic.. > > possibly something is setting it earlier > (There are other times we force single-threading) > and forgetting to unset it. > (e.g. in fork or exec) > > it is unset using the thread_single_end() call. There is a race in kse_release() and kse_wakeup() that looks to have been introduced when sleepqueues were added. msleep() drops the mutex too early (in this case the proc lock) and TDF_SINTR gets set after the mutex has been released. So the code in thread_single(): if (force_exit == SINGLE_EXIT) { if (TD_IS_SUSPENDED(td2)) { thread_unsuspend_one(td2); } -> if (TD_ON_SLEEPQ(td2) && -> (td2->td_flags & TDF_SINTR)) { sleepq_abort(td2); } may not abort the thread in the sleep queue. -- Dan Eischen