From owner-freebsd-threads@FreeBSD.ORG  Thu Sep  9 19:08:27 2004
Return-Path: <owner-freebsd-threads@FreeBSD.ORG>
Delivered-To: freebsd-threads@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id DEB7616A4CE; Thu,  9 Sep 2004 19:08:27 +0000 (GMT)
Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id A216A43D58; Thu,  9 Sep 2004 19:08:26 +0000 (GMT)
	(envelope-from julian@elischer.org)
Received: from elischer.org (julian.vicor-nb.com [208.206.78.97])
	by mail.vicor-nb.com (Postfix) with ESMTP
	id 31D977A3E1; Thu,  9 Sep 2004 12:08:26 -0700 (PDT)
Message-ID: <4140AA2A.90605@elischer.org>
Date: Thu, 09 Sep 2004 12:08:26 -0700
From: Julian Elischer <julian@elischer.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516
X-Accept-Language: en, hu
MIME-Version: 1.0
To: Andrew Gallatin <gallatin@cs.duke.edu>
References: <16703.11479.679335.588170@grasshopper.cs.duke.edu>
	<16703.12410.319869.29996@grasshopper.cs.duke.edu>
	<413F55B8.50003@elischer.org>
	<16703.28031.454342.774229@grasshopper.cs.duke.edu>
	<413F8DBB.5040502@elischer.org>
	<16704.40876.708925.425911@grasshopper.cs.duke.edu>
In-Reply-To: <16704.40876.708925.425911@grasshopper.cs.duke.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
cc: John Baldwin <jhb@freebsd.org>
cc: freebsd-threads@freebsd.org
Subject: Re: Unkillable KSE threaded proc
X-BeenThere: freebsd-threads@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Threading on FreeBSD <freebsd-threads.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-threads>
List-Post: <mailto:freebsd-threads@freebsd.org>
List-Help: <mailto:freebsd-threads-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Sep 2004 19:08:28 -0000

thanks,
I'm flooded with work for a couple of days..

it looks as if one ofthe threads (0xc1b614b0) has called exit, 
whichmeans it is in thread_single()
waiting for all the other threads to suicide, but at least one of them 
doen't want to..

Two of them (0xc1b61320 and 0xc2b6ce10) are refusing to finish up and exit
because they need the proc lock, which is owned by a fourth one.. 
(0xc1b617d0)

the fourth one has just preempted itself with some other thread 
(3244003328  whatever that is in
hex (0xC15B9000))  do you still have the 'ps'?
what is thread (0xC15B9000)?

the thread that holds teh lock is the first one below..
[skip below for further comments.]


interestingly
Andrew Gallatin wrote:

>Julian Elischer writes:
> > 
> > I think it is
> > 
> > show thread (address)
>
>FWIW, I think db_trace(thread addr, -1) seems to work better.
>When I enter ddb, currproc is init, so show thread
>seems to show garbage.
>
>
>db> ps                                 
>  pid   proc     uarea   uid  ppid  pgrp  flag   stat  wmesg    wchan  cmd
>
>   thread 0xc1b617d0 ksegrp 0xc18779a0 [CPU 1]
>   thread 0xc1b614b0 ksegrp 0xc18779a0 [SUSP]
>   thread 0xc1b61320 ksegrp 0xc18779a0 [LOCK process lock c1b13200]
>   thread 0xc2b6ce10 ksegrp 0xc1a270e0 [LOCK process lock c1b13200]
>
>db> call db_trace_thread(0xc1b617d0, -1)
>sched_switch(3249936336,3244003328,3244003328,468695918,1992661338) at sched_switch+216
>mi_switch(2,3244003328,3244003668,3244003328,3867700060) at mi_switch+455
>maybe_preempt(3244003328,252,0,3867700072,3226402603) at maybe_preempt+153
>sched_add(70,3867700092,3226402999,3246881184,3867189248) at sched_add+259
>end() at 3246881184
>0
>  
>

odd that teh stack trace stops there?? that in itself is wierd..
I don't understand why the thread is marked as currently running on 
CPU1. it called sched_switch that should have saved its state
and put it on teh run queue (and marked it as such) so its state should 
be RUNQ.
unless it has got into some infinite loop there, either going in or out 
of the switchout.
it would be interesting to see the actual instruction pointer.. notice 
that preemption is involved...

john may also have an idea..  (CC'd)_


>db> call db_trace_thread(0xc1b614b0, -1)
>sched_switch(3249935536,3249936336,0,2929115342,3959095726) at sched_switch+216
>mi_switch(1,3249936336,0,0,0) at mi_switch+455
>thread_single(1,423437840,7706937,1737258498,3243666960) at thread_single+471
>exit1(3249935536,9,3867675836,3867675876,3226344614) at exit1+277
>expand_name(3249935536,9,256,0,0) at expand_name
>postsig(9,3867675976,2,3243701424,0) at postsig+516
>ast(3867675976) at ast+1508
>doreti_ast() at doreti_ast+23
>0
>
>db> call db_trace_thread(0xc1b61320, -1)
>sched_switch(3249935136,0,0,2147060238,4154263705) at sched_switch+216
>mi_switch(1,0,3249936336,3228346184,0) at mi_switch+455
>turnstile_wait(3249615360,3248629164,3249936336,3248629056,3249935136) at turnstile_wait+825
>_mtx_lock_sleep(3248629164,3249935136,0,0,0) at _mtx_lock_sleep+290
>kse_release(3249935136,3867663636,4,3249935136,3867663676) at kse_release+322
>syscall(47,47,47,134562304,0) at syscall+764
>Xint0x80_syscall() at Xint0x80_syscall+31
>--- syscall (383, FreeBSD ELF32, kse_release), eip = 671759695, esp = 135876488, ebp = 135876548 ---
>0
>
>db> call db_trace_thread(0xc2b6ce10, -1)
>sched_switch(3266760208,0,0,2564282502,2143396982) at sched_switch+216
>mi_switch(1,0,3266760208,3244171108,3228328544) at mi_switch+455
>turnstile_wait(3249615360,3248629164,3249936336,3248629056,3266760208) at turnstile_wait+825
>_mtx_lock_sleep(3248629164,3266760208,0,0,0) at _mtx_lock_sleep+290
>kse_release(3266760208,3901611284,4,3266760208,3901611324) at kse_release+322
>syscall(47,47,3215917103,1,129) at syscall+764
>Xint0x80_syscall() at Xint0x80_syscall+31
>--- syscall (383, FreeBSD ELF32, kse_release), eip = 671759695, esp = 3215978288, ebp = 3215978380 ---
>0
>
>
> > but if yuo can get a coredump it would be best..
> > in ddb do:
> > call doadump
> > 
> > in this case it looks like  thread 0xc1f2aaf0 has called exit() and is 
> > waiting for the others to exit..
> > I wonder if the lock is the answer.. it woul dbe good to follow the link 
> > in the mutex in the proc structure at 0xc1a2d8c0
> > to see which thread OWNS it..
>
>I'm following it from 0xc1a22540 for today's lockup:
>
>(kgdb) p $proc->p_mtx
>$3 = {
>  mtx_object = {
>    lo_class = 0xc069e55c, 
>    lo_name = 0xc067788d "process lock", 
>    lo_type = 0xc067788d "process lock", 
>    lo_flags = 0x430000, 
>    lo_list = {
>      tqe_next = 0x0, 
>      tqe_prev = 0x0
>    }, 
>    lo_witness = 0x0
>  }, 
>  mtx_lock = 0xc1b617d2, 
>  mtx_recurse = 0x0
>}
>
>
>0xc1b617d2 is almost the same as the thread id of the
>first thread (0xc1b617d0)..
>
>I've still got the dump, so if you need more info please let me know.
>
>Drew
>_______________________________________________
>freebsd-threads@freebsd.org mailing list
>http://lists.freebsd.org/mailman/listinfo/freebsd-threads
>To unsubscribe, send any mail to "freebsd-threads-unsubscribe@freebsd.org"
>  
>