From owner-freebsd-threads@FreeBSD.ORG  Fri Sep 17 04:40:49 2004
Return-Path: <owner-freebsd-threads@FreeBSD.ORG>
Delivered-To: freebsd-threads@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id D6F0516A4CE; Fri, 17 Sep 2004 04:40:49 +0000 (GMT)
Received: from pimout3-ext.prodigy.net (pimout3-ext.prodigy.net
	[207.115.63.102])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 05A0243D2F; Fri, 17 Sep 2004 04:40:49 +0000 (GMT)
	(envelope-from julian@elischer.org)
Received: from elischer.org (adsl-64-164-9-59.dsl.snfc21.pacbell.net
	[64.164.9.59])i8H4ekNm033004;	Fri, 17 Sep 2004 00:40:47 -0400
Message-ID: <414A6ACD.2020600@elischer.org>
Date: Thu, 16 Sep 2004 21:40:45 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4b) Gecko/20030524
X-Accept-Language: en, hu
MIME-Version: 1.0
To: John Baldwin <jhb@freebsd.org>
References: <16703.11479.679335.588170@grasshopper.cs.duke.edu>
	<414942B3.1060703@elischer.org>
	<16713.38977.864343.415015@grasshopper.cs.duke.edu>
	<200409161316.43010.jhb@FreeBSD.org>
In-Reply-To: <200409161316.43010.jhb@FreeBSD.org>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
cc: Andrew Gallatin <gallatin@cs.duke.edu>
cc: freebsd-threads@freebsd.org
Subject: Re: Unkillable KSE threaded proc
X-BeenThere: freebsd-threads@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Threading on FreeBSD <freebsd-threads.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-threads>
List-Post: <mailto:freebsd-threads@freebsd.org>
List-Help: <mailto:freebsd-threads-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Sep 2004 04:40:50 -0000

John Baldwin wrote:
> On Thursday 16 September 2004 09:42 am, Andrew Gallatin wrote:
> 
>>Julian Elischer writes:
>> > Andrew, please try -current on ts own now..
>> > I have checked in some fixes that have helped others.
>>
>>OK, preemption off... Still a system lockup, but a little different.
>>
>>The interesting thing here is that continuing and breaking into the
>>debugger repeatedly seems to show that thread 0xc1646af0 is looping in
>>exit.  I've seen him in thread_single, thread_suspend_check, and in
>>exit itself at kern_exit.c:163, etc.  A breakpoint in
>>thread_suspend_one never triggers, so I guess he's holding the proc
>>lock and just looping forever.  A breakpoint in _mtx_assert() shows
>>him asserting the proc lock in thread_suspend_check at kern_thread.c:898.
>>Over and over.
> 
> 
> There is definitely some sort of infinite loop here.  Stripping out the 
> comments in exit1() for that section of code reveals basically:
> 
>         PROC_LOCK(p);
>         if (p->p_flag & P_HADTHREADS) {
> retry:
>                 thread_suspend_check(0);
>                 if (thread_single(SINGLE_EXIT))
>                         goto retry;
> 	}
>         p->p_flag |= P_WEXIT;
>         PROC_UNLOCK(p);
> 
> So it's easy to see how it can stuck in a loop I think.  If thread_single() 
> never drops the lock then other threads that are waiting to die can't 
> actually wait because they can never get the proc lock so that they can die.
> 


hmm intersting..
but this code hasn't changed in ages...


in thread_single we see:

                 thread_suspend_one(td);
                 PROC_UNLOCK(p);
                 mi_switch(SW_VOL, NULL);
                 mtx_unlock_spin(&sched_lock);
                 PROC_LOCK(p);
                 mtx_lock_spin(&sched_lock);

so when it sleeps it releases the proc lock.