From owner-freebsd-current@FreeBSD.ORG  Mon Apr 26 14:20:01 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 186B116A4CF
	for <freebsd-current@freebsd.org>;
	Mon, 26 Apr 2004 14:20:01 -0700 (PDT)
Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 96C0D43D45
	for <freebsd-current@freebsd.org>;
	Mon, 26 Apr 2004 14:20:00 -0700 (PDT)
	(envelope-from eischen@vigrid.com)
Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4])
	by mail.pcnet.com (8.12.10/8.12.1) with ESMTP id i3QLJjQk012612;
	Mon, 26 Apr 2004 17:19:45 -0400 (EDT)
Date: Mon, 26 Apr 2004 17:19:45 -0400 (EDT)
From: Daniel Eischen <eischen@vigrid.com>
X-Sender: eischen@pcnet5.pcnet.com
To: Julian Elischer <julian@elischer.org>
In-Reply-To: <Pine.BSF.4.21.0404261154250.45911-100000@InterJet.elischer.org>
Message-ID: <Pine.GSO.4.10.10404261714360.10647-100000@pcnet5.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-current@freebsd.org
cc: Dan Nelson <dnelson@allantgroup.com>
Subject: Re: panic: Exit: Single threading fouled up
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Apr 2004 21:20:01 -0000

On Mon, 26 Apr 2004, Julian Elischer wrote:
> 
> On Mon, 26 Apr 2004, Dan Nelson wrote:
> 
> > In the last episode (Apr 26), Gavin Atkinson said:
> > > I've seen this panic twice now, once on a heavily loaded UP machine
> > > running gnome at the time, and once on an SMP (hyperthreaded) machine
> > > which was mostly idle as it was shutting down. Both running with ULE.
> > 
> > I've gotten it 6 times while running the pike testsuite, but not
> > reliably enough that I can run a WITNESS kernel for a couple hours and
> > catch it.  SMP system, 4BSD scheduler, libpthread.  Hangs trying to
> > flush buffers so it has never generated a crashdump.  The couple of
> > times I was able to break into the debugger before the hang, a ps
> > showed most of the processes in the system waiting for the "proctree"
> > mutex.
> 
> By chance I'm reading around that code at the moment..
> here's what is going on..
> 
> When a threaded process exits, all threads except the one that is
> actually doing the exit() are forced to abort.
> THEORETICALLY the first thread to get to this code
> should run thread_single() and if another thread calls exit() it should
> block looking for the proc lock until the first thread has successfully
> set the "die-you-scum" flag and then proceed on and see that flag and
> just commit suicide.
> 
> actual code is (simplified):
> 
>         PROC_LOCK(p);
>         if (p->p_flag & P_SA || p->p_numthreads > 1) {
>                 thread_suspend_check(0);
>                 if (thread_single(SINGLE_EXIT))
>                         panic ("Exit: Single threading fouled up");
>                 /*
> where
> thread_single(SINGLE_EXIT)
> is: (simplified)
> 
> thread_single(int force_exit)
> {
>         struct proc *p;
> 
>         td = curthread;
>         p = td->td_proc;
>           
>         if ((p->p_flag & P_SA) == 0 && p->p_numthreads == 1)
>                 return (0);
> 
>         /* Is someone already single threading? */
>         if (p->p_singlethread)
>                 return (1);
> 
>         if (force_exit == SINGLE_EXIT) {
>                 p->p_flag |= P_SINGLE_EXIT;
>         } else
>                 p->p_flag &= ~P_SINGLE_EXIT;
>         p->p_flag |= P_STOPPED_SINGLE;
>         mtx_lock_spin(&sched_lock);
>         p->p_singlethread = td;
> [...]
> 	[set flags that should trigger thread_suspend_check()]
> 	return (0);
> }
> 
> This means that, despite the fact that the proc lock is required,
> two threads have managed to get into the same code.
> 
> the thread_suspend_check(0); for the 2nd thread coming in should just
> abort the thread and never return, so it should never proceed on to see
> the p->p_singlethread already set to non-NULL.
> 
> but it does.. hence this panic..
> 
> possibly something is setting it earlier 
> (There are other times we force single-threading)
> and forgetting to unset it.
> (e.g. in fork or exec) 
> 
> it is unset using the thread_single_end() call.

There is a race in kse_release() and kse_wakeup() that looks
to have been introduced when sleepqueues were added.  msleep()
drops the mutex too early (in this case the proc lock) and
TDF_SINTR gets set after the mutex has been released.  So
the code in thread_single():

				if (force_exit == SINGLE_EXIT) {
					if (TD_IS_SUSPENDED(td2)) {
						thread_unsuspend_one(td2);
					}
->					if (TD_ON_SLEEPQ(td2) &&
->					    (td2->td_flags & TDF_SINTR)) {
						sleepq_abort(td2);
					}

may not abort the thread in the sleep queue.

-- 
Dan Eischen