From owner-freebsd-threads@FreeBSD.ORG Thu Sep 16 13:42:33 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3123016A4CE; Thu, 16 Sep 2004 13:42:33 +0000 (GMT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id C0A3743D4C; Thu, 16 Sep 2004 13:42:32 +0000 (GMT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.12.10/8.12.10) with ESMTP id i8GDgVJt003807 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 16 Sep 2004 09:42:31 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id i8GDgP7E071783; Thu, 16 Sep 2004 09:42:25 -0400 (EDT) (envelope-from gallatin) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16713.38977.864343.415015@grasshopper.cs.duke.edu> Date: Thu, 16 Sep 2004 09:42:25 -0400 (EDT) To: Julian Elischer In-Reply-To: <414942B3.1060703@elischer.org> References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.49447.290897.602540@grasshopper.cs.duke.edu> <4146AAC1.5020701@elischer.org> <16711.383.448500.578640@grasshopper.cs.duke.edu> <414942B3.1060703@elischer.org> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 13:42:33 -0000 Julian Elischer writes: > Andrew, please try -current on ts own now.. > I have checked in some fixes that have helped others. OK, preemption off... Still a system lockup, but a little different. The interesting thing here is that continuing and breaking into the debugger repeatedly seems to show that thread 0xc1646af0 is looping in exit. I've seen him in thread_single, thread_suspend_check, and in exit itself at kern_exit.c:163, etc. A breakpoint in thread_suspend_one never triggers, so I guess he's holding the proc lock and just looping forever. A breakpoint in _mtx_assert() shows him asserting the proc lock in thread_suspend_check at kern_thread.c:898. Over and over. I don't know how to figure out where the other cpu-bound thread is. A ktrace does not show it bouncing around in our driver's ioctl handler. If you have a KTR mask you think might be helpful, I'd be happy to build a ktr kernel to try to get more info from the thread on CPU1. Drew [halt - sent] KDB: enter: Line break on console [thread 100097] Stopped at kdb_enter+0x30: leave db> sho pcpu cpuid = 0 curthread = 0xc1646af0: pid 575 "mx_pingpong" curpcb = 0xe52ceda0 fpcurthread = none idlethread = 0xc1561640: pid 12 "idle: cpu0" APIC ID = 0 currentldt = 0x30 db> tr kdb_enter(c066f1a0,c063158a,a0,c16f3140,e52ceba8) at kdb_enter+0x30 siointr1(c1637800,0,c066ef68,6ad,e52ceb90) at siointr1+0xd1 siointr(c1637800,c06a18c0,c065cd10,e52ceb9c,4) at siointr+0x35 intr_execute_handlers(c1556e90,e52ceba8,e52cec08,c061bf03,34) at intr_execute_handlers+0xb8 lapic_handle_intr(34) at lapic_handle_intr+0x3b Xapic_isr1() at Xapic_isr1+0x33 --- interrupt, eip = 0xc04cd58d, esp = 0xe52cebec, ebp = 0xe52cec08 --- _mtx_assert(c186de6c,1,c065cd10,382,c186de00) at _mtx_assert+0xc thread_suspend_check(0,0,c0659712,88,e52cec68) at thread_suspend_check+0x59 exit1(c1646af0,9,c065c326,996,1) at exit1+0xc9 expand_name(c1646af0,9,c065c326,928,0) at expand_name postsig(9,0,c065ef8f,100,1020800) at postsig+0x1e0 ast(e52ced48) at ast+0x46e doreti_ast() at doreti_ast+0x17 db> ps pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd 575 c186de00 e6772000 1387 1 573 000c482 (threaded) mx_pingpong thread 0xc1646af0 ksegrp 0xc1871070 [CPU 0] thread 0xc1646c80 ksegrp 0xc1871070 [SUSP] thread 0xc1646e10 ksegrp 0xc1871070 [RUNQ] thread 0xc1648000 ksegrp 0xc15ba230 [CPU 1] db> call db_trace_thread(0xc1646c80, 10) sched_switch(c1646c80,c1646af0,1,11d,a273455a) at sched_switch+0x16e mi_switch(1,c1646af0,c065cd10,335,c186de6c) at mi_switch+0x2ad thread_single(1,0,c0659712,88,67e8ac52) at thread_single+0x1d7 exit1(c1646c80,9,c065c326,996,1) at exit1+0xd5 expand_name(c1646c80,9,c065c326,928,0) at expand_name postsig(9,0,c065ef8f,100,1020800) at postsig+0x1e0 ast(e52d1d48) at ast+0x46e doreti_ast() at doreti_ast+0x17 0 db> call db_trace_thread(0xc1646e10, 10) sched_switch(c1646e10,0,2,117,8da55b4a) at sched_switch+0x16e mi_switch(2,0,c065ef8f,f5,1010000) at mi_switch+0x2ad ast(e52d4d48) at ast+0x3c1 doreti_ast() at doreti_ast+0x17 0 db> call db_trace_thread(0xc1648000, 10) sched_switch(18e,3a99,c15ba230,1e,0) at sched_switch+0x16e __func__.0() at __func__.0+0xacd5 0