From owner-freebsd-threads@FreeBSD.ORG Fri Sep 17 04:40:49 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D6F0516A4CE; Fri, 17 Sep 2004 04:40:49 +0000 (GMT) Received: from pimout3-ext.prodigy.net (pimout3-ext.prodigy.net [207.115.63.102]) by mx1.FreeBSD.org (Postfix) with ESMTP id 05A0243D2F; Fri, 17 Sep 2004 04:40:49 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (adsl-64-164-9-59.dsl.snfc21.pacbell.net [64.164.9.59])i8H4ekNm033004; Fri, 17 Sep 2004 00:40:47 -0400 Message-ID: <414A6ACD.2020600@elischer.org> Date: Thu, 16 Sep 2004 21:40:45 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4b) Gecko/20030524 X-Accept-Language: en, hu MIME-Version: 1.0 To: John Baldwin References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <414942B3.1060703@elischer.org> <16713.38977.864343.415015@grasshopper.cs.duke.edu> <200409161316.43010.jhb@FreeBSD.org> In-Reply-To: <200409161316.43010.jhb@FreeBSD.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: Andrew Gallatin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Sep 2004 04:40:50 -0000 John Baldwin wrote: > On Thursday 16 September 2004 09:42 am, Andrew Gallatin wrote: > >>Julian Elischer writes: >> > Andrew, please try -current on ts own now.. >> > I have checked in some fixes that have helped others. >> >>OK, preemption off... Still a system lockup, but a little different. >> >>The interesting thing here is that continuing and breaking into the >>debugger repeatedly seems to show that thread 0xc1646af0 is looping in >>exit. I've seen him in thread_single, thread_suspend_check, and in >>exit itself at kern_exit.c:163, etc. A breakpoint in >>thread_suspend_one never triggers, so I guess he's holding the proc >>lock and just looping forever. A breakpoint in _mtx_assert() shows >>him asserting the proc lock in thread_suspend_check at kern_thread.c:898. >>Over and over. > > > There is definitely some sort of infinite loop here. Stripping out the > comments in exit1() for that section of code reveals basically: > > PROC_LOCK(p); > if (p->p_flag & P_HADTHREADS) { > retry: > thread_suspend_check(0); > if (thread_single(SINGLE_EXIT)) > goto retry; > } > p->p_flag |= P_WEXIT; > PROC_UNLOCK(p); > > So it's easy to see how it can stuck in a loop I think. If thread_single() > never drops the lock then other threads that are waiting to die can't > actually wait because they can never get the proc lock so that they can die. > hmm intersting.. but this code hasn't changed in ages... in thread_single we see: thread_suspend_one(td); PROC_UNLOCK(p); mi_switch(SW_VOL, NULL); mtx_unlock_spin(&sched_lock); PROC_LOCK(p); mtx_lock_spin(&sched_lock); so when it sleeps it releases the proc lock.