From owner-freebsd-threads@FreeBSD.ORG Thu Sep 16 12:51:07 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BB27E16A4CE; Thu, 16 Sep 2004 12:51:07 +0000 (GMT) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5188E43D1F; Thu, 16 Sep 2004 12:51:07 +0000 (GMT) (envelope-from gallatin@cs.duke.edu) Received: from grasshopper.cs.duke.edu (grasshopper.cs.duke.edu [152.3.145.30]) by duke.cs.duke.edu (8.12.10/8.12.10) with ESMTP id i8GCp4Jt025439 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 16 Sep 2004 08:51:04 -0400 (EDT) Received: (from gallatin@localhost) by grasshopper.cs.duke.edu (8.12.9p2/8.12.9/Submit) id i8GCowwL071743; Thu, 16 Sep 2004 08:50:58 -0400 (EDT) (envelope-from gallatin) From: Andrew Gallatin MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16713.35890.516192.596992@grasshopper.cs.duke.edu> Date: Thu, 16 Sep 2004 08:50:58 -0400 (EDT) To: Julian Elischer In-Reply-To: <414942B3.1060703@elischer.org> References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.49447.290897.602540@grasshopper.cs.duke.edu> <4146AAC1.5020701@elischer.org> <16711.383.448500.578640@grasshopper.cs.duke.edu> <414942B3.1060703@elischer.org> X-Mailer: VM 6.75 under 21.1 (patch 12) "Channel Islands" XEmacs Lucid cc: John Baldwin cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 12:51:07 -0000 Julian Elischer writes: > Andrew, please try -current on ts own now.. > I have checked in some fixes that have helped others. I just tried, and had 2 different results. 2 system lockups, and one lingering thread. This is with PREEMPTION. I'm going to try again in a second w/o PREEMPTION. The last system lockup was kinda interesting, here are some details. For all my test setups, there has been one mx_pingpong running as root, and one mx_pingpong running as me. After the skill, a vmstat (running as root) kept going, and showed that the test was still running (like the signal bounced off of it). Further confirmation is that the mx_pingpong running as root exited normally, indicating that the other side had run to completion. I then killed vmstat and did a 'ps ax'. The ps got stuck on the skill'ed mx_pingpong's proc lock (note the address passed to the mtx_lock in the ps's frame). At this point, it looked like this: KDB: enter: Line break on console [thread 100146] Stopped at kdb_enter+0x30: leave db> sho pcpu cpuid = 0 curthread = 0xc1a15960: pid 561 "ps" curpcb = 0xe67b2da0 fpcurthread = none idlethread = 0xc1561640: pid 12 "idle: cpu0" APIC ID = 0 currentldt = 0x30 db> pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd 561 c1a14a80 e67de000 0 541 561 0004002 [CPU 0] ps 551 c1647e00 e5321000 1387 1 549 000c482 (threaded) mx_pingpong thread 0xc1646c80 ksegrp 0xc15ba690 [CPU 1] thread 0xc1646af0 ksegrp 0xc15ba690 [SUSP] 541 c1a18c40 e67e8000 0 538 541 0004002 [SLPQ pause 0xc1a18c78][SLP] csh <...> db> tr kdb_enter(c066f281,46,40,c16f3140,e67b2b14) at kdb_enter+0x30 siointr1(c1637800,0,c066f049,6ad,e67b2afc) at siointr1+0xd1 siointr(c1637800,0,c06a19a0,0,4) at siointr+0x35 intr_execute_handlers(c1556e90,e67b2b14,e67b2b74,c061bf53,34) at intr_execute_handlers+0xb8 lapic_handle_intr(34) at lapic_handle_intr+0x3b Xapic_isr1() at Xapic_isr1+0x33 --- interrupt, eip = 0xc04cd32b, esp = 0xe67b2b58, ebp = 0xe67b2b74 --- _mtx_lock_sleep(c1647e6c,c1a15960,0,c065b894,3c5) at _mtx_lock_sleep+0x12e _mtx_lock_flags(c1647e6c,0,c065b894,3c5,0) at _mtx_lock_flags+0x9f sysctl_kern_proc(c0687d00,e67b2c88,0,e67b2c10,e67b2c10) at sysctl_kern_proc+0x241 sysctl_root(0,e67b2c7c,3,e67b2c10,c1a15960) at sysctl_root+0x13b userland_sysctl(c1a15960,e67b2c7c,3,0,bfbfe28c) at userland_sysctl+0x11c __sysctl(c1a15960,e67b2d14,18,8053000,6) at __sysctl+0xb0 syscall(2f,2f,2f,bfbfe28c,bfbfe2c0) at syscall+0x271 Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (202, FreeBSD ELF32, __sysctl), eip = 0x280f3ee7, esp = 0xbfbfe22c, ebp = 0xbfbfe258 --- According to gdb: 0xc04d085d is in sysctl_kern_proc (../../../kern/kern_proc.c:965). 960 if (p->p_state == PRS_NEW) { 961 mtx_unlock_spin(&sched_lock); 962 continue; 963 } 964 mtx_unlock_spin(&sched_lock); 965 PROC_LOCK(p); 966 /* 967 * Show a user only appropriate processes. 968 */ 969 if (p_cansee(curthread, p)) { db> call db_trace_thread(0xc1646c80, -1) sched_switch(c1646c80,c159f190,2,117,6a5c13ea) at sched_switch+0x16e mi_switch(2,c1646c80,c1646c80,c06ad340,4) at mi_switch+0x2ad maybe_preempt(e52d1bec,e52d1b78,c04e7482,c06ad340,c1646c80) at maybe_preempt+0x192 (null)(0,c1646c88,0,c1646c90,0) at 0x240 end(c15ba690,c15ba694,c1646c80,c1646af8,c1646af0) at 0xc15ba690 end(c15e4460,c15e4464,c187a960,c187a968,0) at 0xc1a14a80 <...> db> call db_trace_thread(0xc1646af0, -1) sched_switch(c1646af0,0,1,11d,4b34ccaa) at sched_switch+0x16e mi_switch(1,0,c065cd70,335,c1647e6c) at mi_switch+0x2ad thread_single(1,0,c0659772,88,e52cec70) at thread_single+0x1d7 exit1(c1646af0,9,c065c386,996,1) at exit1+0xd5 expand_name(c1646af0,9,c065c386,928,0) at expand_name postsig(9,0,c065f070,100,1020800) at postsig+0x1e0 ast(e52ced48) at ast+0x46e doreti_ast() at doreti_ast+0x17 0 Drew