From owner-freebsd-current@FreeBSD.ORG Wed Jan 7 20:47:19 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 931FC16A4CE for ; Wed, 7 Jan 2004 20:47:19 -0800 (PST) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3FA2943D5A for ; Wed, 7 Jan 2004 20:47:18 -0800 (PST) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.10/8.12.10) with ESMTP id i084jrUd016693 for ; Wed, 7 Jan 2004 23:45:54 -0500 (EST) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i084jr9r016690 for ; Wed, 7 Jan 2004 23:45:53 -0500 (EST) (envelope-from robert@fledge.watson.org) Date: Wed, 7 Jan 2004 23:45:53 -0500 (EST) From: Robert Watson X-Sender: robert@fledge.watson.org To: current@FreeBSD.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: strace, holding sigacts lock over postsig(), et al. X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Jan 2004 04:47:19 -0000 Got a bug report this evening that the strace package hangs on 5-CURRENT. I'm able to confirm this; for those that don't know, strace makes extensive use of procfs. On attempting to reproduce it, I first got: crash2# strace ls Sleeping on "stopevent" with the following non-sleepable locks held: exclusive sleep mutex sigacts r = 0 (0xc20e2aa8) locked @ kern/subr_trap.c:260 lock order reversal 1st 0xc20e2aa8 sigacts (sigacts) @ kern/subr_trap.c:260 2nd 0xc20f1224 process lock (process lock) @ kern/kern_synch.c:309 Stack backtrace: backtrace(c0864c4a,c20f1224,c0860e7b,c0860e7b,c0861ee5) at backtrace+0x17 witness_lock(c20f1224,8,c0861ee5,135,c20f1224) at witness_lock+0x672 _mtx_lock_flags(c20f1224,0,c0861edc,135,ffffffff) at _mtx_lock_flags+0xba msleep(c20f12e8,c20f1224,5c,c0865441,0) at msleep+0x794 stopevent(c20f11b8,2,13,823,c0922200) at stopevent+0x85 issignal(c1f31bd0,2,c08619f7,bd,1) at issignal+0x168 cursig(c1f31bd0,0,c0864399,104,0) at cursig+0xe8 ast(c9520d48) at ast+0x4b0 doreti_ast() at doreti_ast+0x17 load: 0.21 cmd: strace 583 [iowait] 0.00u 0.91s 0% 724k [sent a serial break] Cool, eh? Second try: crash2# strace ls execve(0xbfbfe890, [0xbfbfed54], [/* 0 vars */]PIOCWSTOP: Input/output error Even better. The first obvious observation is that holding mutexes other than the process mutex over calls to _STOPEVENT() is a bad idea. It seems like the p_sig mutex is used to cover a fair amount of flag handling, signal entry changes, etc, etc. I'm not familiar with the semantic requirements here, but presumably something needs to change. Is it possible to release the locks after grabbing the value of 'action' (or even do a lock-free read), and then grab the sigact lock only later during actual delivery, yet maintain the right semantics? Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Senior Research Scientist, McAfee Research