Date: Thu, 8 Jan 2004 04:10:55 -0800 (PST) From: Don Lewis <truckman@FreeBSD.org> To: rwatson@FreeBSD.org Cc: current@FreeBSD.org Subject: Re: strace, holding sigacts lock over postsig(), et al. Message-ID: <200401081210.i08CAt7E019668@gw.catspoiler.org> In-Reply-To: <Pine.NEB.3.96L.1040107233603.13166H-100000@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 7 Jan, Robert Watson wrote: > > Got a bug report this evening that the strace package hangs on 5-CURRENT. > I'm able to confirm this; for those that don't know, strace makes > extensive use of procfs. On attempting to reproduce it, I first got: > > crash2# strace ls > Sleeping on "stopevent" with the following non-sleepable locks held: > exclusive sleep mutex sigacts r = 0 (0xc20e2aa8) locked @ > kern/subr_trap.c:260 > lock order reversal > 1st 0xc20e2aa8 sigacts (sigacts) @ kern/subr_trap.c:260 > 2nd 0xc20f1224 process lock (process lock) @ kern/kern_synch.c:309 > Stack backtrace: > backtrace(c0864c4a,c20f1224,c0860e7b,c0860e7b,c0861ee5) at backtrace+0x17 > witness_lock(c20f1224,8,c0861ee5,135,c20f1224) at witness_lock+0x672 > _mtx_lock_flags(c20f1224,0,c0861edc,135,ffffffff) at _mtx_lock_flags+0xba > msleep(c20f12e8,c20f1224,5c,c0865441,0) at msleep+0x794 > stopevent(c20f11b8,2,13,823,c0922200) at stopevent+0x85 > issignal(c1f31bd0,2,c08619f7,bd,1) at issignal+0x168 > cursig(c1f31bd0,0,c0864399,104,0) at cursig+0xe8 > ast(c9520d48) at ast+0x4b0 > doreti_ast() at doreti_ast+0x17 > load: 0.21 cmd: strace 583 [iowait] 0.00u 0.91s 0% 724k > [sent a serial break] > > Cool, eh? Second try: > > crash2# strace ls > execve(0xbfbfe890, [0xbfbfed54], [/* 0 vars */]PIOCWSTOP: Input/output > error > > Even better. > > The first obvious observation is that holding mutexes other than the > process mutex over calls to _STOPEVENT() is a bad idea. It seems like the > p_sig mutex is used to cover a fair amount of flag handling, signal entry > changes, etc, etc. I'm not familiar with the semantic requirements here, > but presumably something needs to change. Is it possible to release the > locks after grabbing the value of 'action' (or even do a lock-free read), > and then grab the sigact lock only later during actual delivery, yet > maintain the right semantics? In both issignal() and postsig() I think it would be safe to drop the p_sig mutex before _STOPEVENT() and grab the mutex again afterwards. About the only thing that can happen during the interim would be the receipt of another signal and I don't think that would be a problem. Dropping the mutex is how issignal() handles ptracestop() a bit further down in the code.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200401081210.i08CAt7E019668>