From owner-freebsd-threads@FreeBSD.ORG Thu Sep 9 19:08:27 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DEB7616A4CE; Thu, 9 Sep 2004 19:08:27 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id A216A43D58; Thu, 9 Sep 2004 19:08:26 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id 31D977A3E1; Thu, 9 Sep 2004 12:08:26 -0700 (PDT) Message-ID: <4140AA2A.90605@elischer.org> Date: Thu, 09 Sep 2004 12:08:26 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: Andrew Gallatin References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> In-Reply-To: <16704.40876.708925.425911@grasshopper.cs.duke.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Sep 2004 19:08:28 -0000 thanks, I'm flooded with work for a couple of days.. it looks as if one ofthe threads (0xc1b614b0) has called exit, whichmeans it is in thread_single() waiting for all the other threads to suicide, but at least one of them doen't want to.. Two of them (0xc1b61320 and 0xc2b6ce10) are refusing to finish up and exit because they need the proc lock, which is owned by a fourth one.. (0xc1b617d0) the fourth one has just preempted itself with some other thread (3244003328 whatever that is in hex (0xC15B9000)) do you still have the 'ps'? what is thread (0xC15B9000)? the thread that holds teh lock is the first one below.. [skip below for further comments.] interestingly Andrew Gallatin wrote: >Julian Elischer writes: > > > > I think it is > > > > show thread (address) > >FWIW, I think db_trace(thread addr, -1) seems to work better. >When I enter ddb, currproc is init, so show thread >seems to show garbage. > > >db> ps > pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd > > thread 0xc1b617d0 ksegrp 0xc18779a0 [CPU 1] > thread 0xc1b614b0 ksegrp 0xc18779a0 [SUSP] > thread 0xc1b61320 ksegrp 0xc18779a0 [LOCK process lock c1b13200] > thread 0xc2b6ce10 ksegrp 0xc1a270e0 [LOCK process lock c1b13200] > >db> call db_trace_thread(0xc1b617d0, -1) >sched_switch(3249936336,3244003328,3244003328,468695918,1992661338) at sched_switch+216 >mi_switch(2,3244003328,3244003668,3244003328,3867700060) at mi_switch+455 >maybe_preempt(3244003328,252,0,3867700072,3226402603) at maybe_preempt+153 >sched_add(70,3867700092,3226402999,3246881184,3867189248) at sched_add+259 >end() at 3246881184 >0 > > odd that teh stack trace stops there?? that in itself is wierd.. I don't understand why the thread is marked as currently running on CPU1. it called sched_switch that should have saved its state and put it on teh run queue (and marked it as such) so its state should be RUNQ. unless it has got into some infinite loop there, either going in or out of the switchout. it would be interesting to see the actual instruction pointer.. notice that preemption is involved... john may also have an idea.. (CC'd)_ >db> call db_trace_thread(0xc1b614b0, -1) >sched_switch(3249935536,3249936336,0,2929115342,3959095726) at sched_switch+216 >mi_switch(1,3249936336,0,0,0) at mi_switch+455 >thread_single(1,423437840,7706937,1737258498,3243666960) at thread_single+471 >exit1(3249935536,9,3867675836,3867675876,3226344614) at exit1+277 >expand_name(3249935536,9,256,0,0) at expand_name >postsig(9,3867675976,2,3243701424,0) at postsig+516 >ast(3867675976) at ast+1508 >doreti_ast() at doreti_ast+23 >0 > >db> call db_trace_thread(0xc1b61320, -1) >sched_switch(3249935136,0,0,2147060238,4154263705) at sched_switch+216 >mi_switch(1,0,3249936336,3228346184,0) at mi_switch+455 >turnstile_wait(3249615360,3248629164,3249936336,3248629056,3249935136) at turnstile_wait+825 >_mtx_lock_sleep(3248629164,3249935136,0,0,0) at _mtx_lock_sleep+290 >kse_release(3249935136,3867663636,4,3249935136,3867663676) at kse_release+322 >syscall(47,47,47,134562304,0) at syscall+764 >Xint0x80_syscall() at Xint0x80_syscall+31 >--- syscall (383, FreeBSD ELF32, kse_release), eip = 671759695, esp = 135876488, ebp = 135876548 --- >0 > >db> call db_trace_thread(0xc2b6ce10, -1) >sched_switch(3266760208,0,0,2564282502,2143396982) at sched_switch+216 >mi_switch(1,0,3266760208,3244171108,3228328544) at mi_switch+455 >turnstile_wait(3249615360,3248629164,3249936336,3248629056,3266760208) at turnstile_wait+825 >_mtx_lock_sleep(3248629164,3266760208,0,0,0) at _mtx_lock_sleep+290 >kse_release(3266760208,3901611284,4,3266760208,3901611324) at kse_release+322 >syscall(47,47,3215917103,1,129) at syscall+764 >Xint0x80_syscall() at Xint0x80_syscall+31 >--- syscall (383, FreeBSD ELF32, kse_release), eip = 671759695, esp = 3215978288, ebp = 3215978380 --- >0 > > > > but if yuo can get a coredump it would be best.. > > in ddb do: > > call doadump > > > > in this case it looks like thread 0xc1f2aaf0 has called exit() and is > > waiting for the others to exit.. > > I wonder if the lock is the answer.. it woul dbe good to follow the link > > in the mutex in the proc structure at 0xc1a2d8c0 > > to see which thread OWNS it.. > >I'm following it from 0xc1a22540 for today's lockup: > >(kgdb) p $proc->p_mtx >$3 = { > mtx_object = { > lo_class = 0xc069e55c, > lo_name = 0xc067788d "process lock", > lo_type = 0xc067788d "process lock", > lo_flags = 0x430000, > lo_list = { > tqe_next = 0x0, > tqe_prev = 0x0 > }, > lo_witness = 0x0 > }, > mtx_lock = 0xc1b617d2, > mtx_recurse = 0x0 >} > > >0xc1b617d2 is almost the same as the thread id of the >first thread (0xc1b617d0).. > >I've still got the dump, so if you need more info please let me know. > >Drew >_______________________________________________ >freebsd-threads@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-threads >To unsubscribe, send any mail to "freebsd-threads-unsubscribe@freebsd.org" > >