From owner-freebsd-current@FreeBSD.ORG Thu Jan 8 04:11:03 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DBE9116A4CE; Thu, 8 Jan 2004 04:11:03 -0800 (PST) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1998443D62; Thu, 8 Jan 2004 04:11:02 -0800 (PST) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.9p2/8.12.9) with ESMTP id i08CAt7E019668; Thu, 8 Jan 2004 04:10:58 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <200401081210.i08CAt7E019668@gw.catspoiler.org> Date: Thu, 8 Jan 2004 04:10:55 -0800 (PST) From: Don Lewis To: rwatson@FreeBSD.org In-Reply-To: MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: current@FreeBSD.org Subject: Re: strace, holding sigacts lock over postsig(), et al. X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Jan 2004 12:11:04 -0000 On 7 Jan, Robert Watson wrote: > > Got a bug report this evening that the strace package hangs on 5-CURRENT. > I'm able to confirm this; for those that don't know, strace makes > extensive use of procfs. On attempting to reproduce it, I first got: > > crash2# strace ls > Sleeping on "stopevent" with the following non-sleepable locks held: > exclusive sleep mutex sigacts r = 0 (0xc20e2aa8) locked @ > kern/subr_trap.c:260 > lock order reversal > 1st 0xc20e2aa8 sigacts (sigacts) @ kern/subr_trap.c:260 > 2nd 0xc20f1224 process lock (process lock) @ kern/kern_synch.c:309 > Stack backtrace: > backtrace(c0864c4a,c20f1224,c0860e7b,c0860e7b,c0861ee5) at backtrace+0x17 > witness_lock(c20f1224,8,c0861ee5,135,c20f1224) at witness_lock+0x672 > _mtx_lock_flags(c20f1224,0,c0861edc,135,ffffffff) at _mtx_lock_flags+0xba > msleep(c20f12e8,c20f1224,5c,c0865441,0) at msleep+0x794 > stopevent(c20f11b8,2,13,823,c0922200) at stopevent+0x85 > issignal(c1f31bd0,2,c08619f7,bd,1) at issignal+0x168 > cursig(c1f31bd0,0,c0864399,104,0) at cursig+0xe8 > ast(c9520d48) at ast+0x4b0 > doreti_ast() at doreti_ast+0x17 > load: 0.21 cmd: strace 583 [iowait] 0.00u 0.91s 0% 724k > [sent a serial break] > > Cool, eh? Second try: > > crash2# strace ls > execve(0xbfbfe890, [0xbfbfed54], [/* 0 vars */]PIOCWSTOP: Input/output > error > > Even better. > > The first obvious observation is that holding mutexes other than the > process mutex over calls to _STOPEVENT() is a bad idea. It seems like the > p_sig mutex is used to cover a fair amount of flag handling, signal entry > changes, etc, etc. I'm not familiar with the semantic requirements here, > but presumably something needs to change. Is it possible to release the > locks after grabbing the value of 'action' (or even do a lock-free read), > and then grab the sigact lock only later during actual delivery, yet > maintain the right semantics? In both issignal() and postsig() I think it would be safe to drop the p_sig mutex before _STOPEVENT() and grab the mutex again afterwards. About the only thing that can happen during the interim would be the receipt of another signal and I don't think that would be a problem. Dropping the mutex is how issignal() handles ptracestop() a bit further down in the code.