From owner-freebsd-threads@FreeBSD.ORG Wed Sep 15 00:32:02 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 14EE916A4CE; Wed, 15 Sep 2004 00:32:02 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id E252943D53; Wed, 15 Sep 2004 00:32:01 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id C485E7A425; Tue, 14 Sep 2004 17:32:01 -0700 (PDT) Message-ID: <41478D81.2010005@elischer.org> Date: Tue, 14 Sep 2004 17:32:01 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: Andrew Gallatin References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.49447.290897.602540@grasshopper.cs.duke.edu> <4146AAC1.5020701@elischer.org> <16711.383.448500.578640@grasshopper.cs.duke.edu> <4147100C.8000005@elischer.org> <16711.5875.358882.236642@grasshopper.cs.duke.edu> In-Reply-To: <16711.5875.358882.236642@grasshopper.cs.duke.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2004 00:32:02 -0000 andrew, if you get a chance, there is a patch at http://www.freebsd.org/~julian/q.diff that has some debugging in it I'd like to see the result of.. if it crashes or hangs processes without triggerring the debugging code then even that tells me something :-) Andrew Gallatin wrote: >FWIW, for the case where there is one lingering thread, calling >thread_unsuspend_one() on it seems to get it to exit.. > >Maybe there is some sort of race while exiting which causes the wrong >number of threads to be either suspended, or unsuspended. If too many >are suspended, one is left lingering. If too few are suspended, the >system deadlocks because a thread never gets off the cpu. > >Would it help at all to try with libthr and see what it does? >Let me know what more I can do to help get this fixed.. > >Drew > >PS: >By "one lingering thread", I mean the case I first complained about. >Eg: > >540 c164e700 e52e1000 1387 1 538 000c482 (threaded) mx_pingpong > thread 0xc1fb8320 ksegrp 0xc15bb850 [SUSP] > >db> tr 540 >sched_switch(c1fb8320,0,0,15fc9814,e30bebc7) at sched_switch+0xd8 >mi_switch(1,0,e881fc44,c051e6dd,c1fb8320) at mi_switch+0x1c7 >thread_single(1,c06eaae0,e881fc64,c164e700,c1fb8320) at >thread_single+0x1d7 >exit1(c1fb8320,9,0,e881fce4,c051877e) at exit1+0x115 >expand_name(c1fb8320,9,100,0,0) at expand_name >postsig(9,202,c06e5dd8,17f,8058f84) at postsig+0x204 >ast(e881fd48) at ast+0x5e4 >doreti_ast() at doreti_ast+0x17 >db> call thread_unsuspend_one(0xc1fb8320) >0xc1562640 > > > > >