From owner-freebsd-threads@FreeBSD.ORG Thu Sep 9 21:38:41 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EF71416A4CE; Thu, 9 Sep 2004 21:38:40 +0000 (GMT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id A071143D48; Thu, 9 Sep 2004 21:38:40 +0000 (GMT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.12.10/8.12.10) with ESMTP id i89LccJt021382 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 9 Sep 2004 17:38:38 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id i89LcX8V058712; Thu, 9 Sep 2004 17:38:33 -0400 (EDT) (envelope-from gallatin) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16704.52569.375858.857614@grasshopper.cs.duke.edu> Date: Thu, 9 Sep 2004 17:38:33 -0400 (EDT) To: Julian Elischer In-Reply-To: <4140C04D.1060906@elischer.org> References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Sep 2004 21:38:41 -0000 Julian Elischer writes: > I think that this would possibly GO AWAY of you disab;ed preemption. > which would make it very hard to debug :-) Nope, still happens w/o preempt.. And its the "worse" problem of deadlocking the system rather than just having the process fail to exit. db> ps pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd 579 c37e41c0 e8855000 1387 578 579 0004002 [SLPQ ttyin 0xc17df810][SLP] csh 578 c1817540 e671a000 1387 576 576 0000100 [SLPQ select 0xc06cb704][SLP] sshd 576 c37e4540 e8857000 0 451 576 0000100 [SLPQ sbwait 0xc1983e84][SLP] sshd 566 c1a1fc40 e67ba000 1387 1 564 000c482 (threaded) mx_pingpong thread 0xc37944b0 ksegrp 0xc1a20460 [CPU 0] thread 0xc3794640 ksegrp 0xc1a20460 [SUSP] thread 0xc187e320 ksegrp 0xc1a20460 [RUNQ] thread 0xc187e4b0 ksegrp 0xc187fee0 [CPU 1] db> call db_trace_thread(0xc37944b0, -1) kdb_enter(c0686ceb,c0645179,fc,c37944b0,c16bd000) at kdb_enter+0x30 siointr1(c16bd000,2,fc,e8842ba0,c0650df2) at siointr1+0xd1 siointr(c16bd000,c37944b0,c1a1fc40,4,c37944b0) at siointr+0x77 intr_execute_handlers(c1556e90,e8842be0,e8842c28,c0639f53,34) at intr_execute_handlers+0x8d lapic_handle_intr(34) at lapic_handle_intr+0x3b Xapic_isr1() at Xapic_isr1+0x33 --- interrupt, eip = 0xc04ea44a, esp = 0xe8842c24, ebp = 0xe8842c28 --- thread_suspend_check(0,246,e8842c60,c0501b86,c37944b0) at thread_suspend_check+0x21f exit1(c37944b0,9,0,0,c04e1e66) at exit1+0x109 expand_name(c37944b0,9,100,0,0) at expand_name postsig(9,c37944b0,0,0,0) at postsig+0x204 ast(e8842d48) at ast+0x5e4 doreti_ast() at doreti_ast+0x17 0 db> call db_trace_thread(0xc3794640, -1) sched_switch(c3794640,c37944b0,0,94bc2c2e,2227b660) at sched_switch+0xd8 mi_switch(1,c37944b0,0,0,0) at mi_switch+0x1c7 thread_single(1,c3794640,0,0,0) at thread_single+0x1d7 exit1(c3794640,9,e8845cbc,e8845ce4,c04e1e66) at exit1+0x115 expand_name(c3794640,9,100,0,0) at expand_name postsig(9,c3794640,0,0,0) at postsig+0x204 ast(e8845d48) at ast+0x5e4 doreti_ast() at doreti_ast+0x17 0 db> call db_trace_thread(0xc187e320, -1) sched_switch(c187e320,0,0,9a67657e,e2359ef4) at sched_switch+0xd8 mi_switch(2,0,0,0,0) at mi_switch+0x1c7 ast(e6749d48) at ast+0x4eb doreti_ast() at doreti_ast+0x17 0 db> call db_trace_thread(0xc187e4b0, -1) sched_switch(c187fee0,1e,0,1e,0) at sched_switch+0xd8 0 db> show pcpu cpuid = 0 curthread = 0xc37944b0: pid 566 "mx_pingpong" curpcb = 0xe8842da0 fpcurthread = none idlethread = 0xc1561640: pid 12 "idle: cpu0" APIC ID = 0 currentldt = 0x30 db> show pcpu 1 cpuid = 1 curthread = 0xc187e4b0: pid 566 "mx_pingpong" curpcb = 0xe674cda0 fpcurthread = none idlethread = 0xc15614b0: pid 11 "idle: cpu1" APIC ID = 1 currentldt = 0x30 According to kgdb, the lock holder for the proc lock is 0xc37944b0: (kgdb) p/x td->td_proc->p_mtx->mtx_lock $8 = 0xc37944b2 Maybe its some sort of spinlock deadlock.. I'm going to enable witness and try again. Drew