From owner-freebsd-threads@FreeBSD.ORG  Wed Sep  8 22:54:52 2004
Return-Path: <owner-freebsd-threads@FreeBSD.ORG>
Delivered-To: freebsd-threads@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 9362716A4CE
	for <freebsd-threads@freebsd.org>;
	Wed,  8 Sep 2004 22:54:52 +0000 (GMT)
Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 7A00343D2D
	for <freebsd-threads@freebsd.org>;
	Wed,  8 Sep 2004 22:54:52 +0000 (GMT)
	(envelope-from julian@elischer.org)
Received: from elischer.org (julian.vicor-nb.com [208.206.78.97])
	by mail.vicor-nb.com (Postfix) with ESMTP
	id 135647A3D2; Wed,  8 Sep 2004 15:54:52 -0700 (PDT)
Message-ID: <413F8DBB.5040502@elischer.org>
Date: Wed, 08 Sep 2004 15:54:51 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Andrew Gallatin <gallatin@cs.duke.edu>
References: <16703.11479.679335.588170@grasshopper.cs.duke.edu>
	<16703.12410.319869.29996@grasshopper.cs.duke.edu>
	<413F55B8.50003@elischer.org>
	<16703.28031.454342.774229@grasshopper.cs.duke.edu>
In-Reply-To: <16703.28031.454342.774229@grasshopper.cs.duke.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
cc: freebsd-threads@freebsd.org
Subject: Re: Unkillable KSE threaded proc
X-BeenThere: freebsd-threads@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Threading on FreeBSD <freebsd-threads.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-threads>
List-Post: <mailto:freebsd-threads@freebsd.org>
List-Help: <mailto:freebsd-threads-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Sep 2004 22:54:52 -0000


Andrew Gallatin wrote:

>Julian Elischer writes:
> > it is possible. Howevr you should try this on -current,  (please) 
> > because I rewrite some of the exit code
> > and may have already fixed it..
> > 
> > a -curent kernel can run a 5.3 userland in general so you may just need 
> > to recompile the kernel.
>
>
>OK, I built a -current kernel from CVS sources dated 8amPDT.
>And it is worse..
>
>The initial skill -9 -u gallatin seems to be ignored by the threaded
>process and it gets re-parented to init when skill takes out its
>parent (sh) and its parent's parent (csh and sshd):
>
># ps axwl | grep ping | grep -v grep
> 1387   607     1 591 132  0 18260 11480 -      R     p0-   5:18.18 tests/mx_pingpong -e 2 -M 2 -E 3000000 -d scream:0
>
>
>Logging in again and doing 'kill -9 607' results in other stuff
>starting to hang. (Can't ssh in again,  kill never seems to return.
>In the following ps, the shell that launched the second kill -9
>is pid 624 (^T also claims its running)
>
>
>
>db> ps
>  pid   proc     uarea   uid  ppid  pgrp  flag   stat  wmesg    wchan  cmd
>  624 c1a28c40 e6808000 1387   623   624 0004002 [CPU 0] csh
>  623 c1f24540 e8858000 1387   621   621 0000100 [SLPQ select 0xc06cb5c4][SLP] sshd
>  621 c1647a80 e52e3000    0   451   621 0000100 [SLPQ sbwait 0xc1990d40][SLP] sshd
>  607 c1a2d8c0 e680f000 1387     1   605 000c482 (threaded)  mx_pingpong
>   thread 0xc1f25960 ksegrp 0xc18808c0 [CPU 1]
>   thread 0xc1f2aaf0 ksegrp 0xc18808c0 [SUSP]
>   thread 0xc1f2a960 ksegrp 0xc18808c0 [RUNQ]
>   thread 0xc1f2a4b0 ksegrp 0xc1f282a0 [LOCK process lock c1b37bc0]
>
>
>db> tr 607
>sched_switch(c1f25960,c15b9000,c15b9000,ae1ed572,3db79502) at sched_switch+0xd8
>mi_switch(2,c15b9000,c15b9154,c15b9000,e884db50) at mi_switch+0x1c7
>maybe_preempt(c15b9000,82,0,c1568c40,c15b9000) at maybe_preempt+0x99
>sched_add(e884db70,46,c1f2a960,46,c18808c0) at sched_add+0x103
>resetpriority(e884db84,e680f000,46,46,c1a2d8c0) at resetpriority+0x62
>_end(c1f282a4,c1f25960,c1f2a970,c1f2a960,c1f2a988) at 0xc1f25960
>(null)(c1f282a0,c18808c4,c1f25960,c1f2a4b8,c1f2aaf0) at 0
>end(c1f28850,c1f28854,c1f25320,c1f25328,0) at 0xc1647a80
>end(c1880af0,c1880af4,c1a29af0,c1a29af8,0) at 0xc1a2d8c0
>_end(c1995000,c1995004,c187f7d0,c187f7d8,0) at 0xc1f24e00
> 
><_end() is repeated quite a few times>
>
>
>Is there any way to get a trace of the other threads from ddb?
>

yes

I think it is

show thread (address)
but if yuo can get a coredump it would be best..
in ddb do:
call doadump

in this case it looks like  thread 0xc1f2aaf0 has called exit() and is 
waiting for the others to exit..
I wonder if the lock is the answer.. it woul dbe good to follow the link 
in the mutex in the proc structure at 0xc1a2d8c0
to see which thread OWNS it..


>
>Drew
>  
>