Date: Tue, 22 Jan 2002 21:10:19 -0800 (PST) From: Julian Elischer <julian@elischer.org> To: Matthew Dillon <dillon@apollo.backplane.com> Cc: Alfred Perlstein <bright@mu.org>, arch@freebsd.org, Bruce Evans <bde@zeta.org.au>, David Greenman <dg@root.com> Subject: Re: PCATCH vs signal(SIGSTOP) (was Re: STOP and SLEEP in the kernel) Message-ID: <Pine.BSF.4.21.0201222104310.20356-100000@InterJet.elischer.org> In-Reply-To: <200201230442.g0N4gCh03552@apollo.backplane.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 22 Jan 2002, Matthew Dillon wrote: > > : > :* Matthew Dillon <dillon@apollo.backplane.com> [020122 19:30] wrote: > :> What really freaks me out is that if t/msleep() is called with PCATCH, > :> it appears to process a STOP signal right then and there and actually > :> stop the process rather then return. t/msleep() is called all over the > :> place with PCATCH while holding vnode and other lockmgr locks so a ^Z > :> at the wrong point could deadlock the system. > :> > :> "That can't be right" I said to myself and to Julian, but neither of us > :> can see where the code might do something else. As far as I can tell the > :> existing -stable and -current code *will* in fact STOP the process > :> while potentially holding (a vnode lock for example). There is a whole > :> lot of code, especially in NFS, that uses PCATCH. It can't be right. > : > :*ARRRRRRGH* > : > :Obviously STOP signals should only be honoured in userret and signal > :entry points. Any chance for a fix? > : > :-- > :-Alfred Perlstein [alfred@freebsd.org] > What I've done in the KSE code is to remove the mi_switch etc from issignal() and add a separate function in userret() that checks for a STOPPED condition on the process and does the mi_switch() then. It seems to work but I seem to have screwed up the restarting code.. :-) I'll hopefully have that fixed tonight. > I'm still not sure it even happens, but I can't find any code to > prevent it. > > I would like somebody whos played with the signal code before, like > BDE or DG, to take a look at the STOP/PCATCH handling. > > For those I've just added to the CC: Julian and I were looking at > the STOP signal handling code and it appears that a tsleep()/msleep() > called with PCATCH can cause the process to go into a STOPped state if > it is signaled at just that moment, leaving held locks in place and > potentially deadlocking the system. There is a whole lot of code in > the kernel that uses PCATCH and assumes that tsleep()/msleep() will > return when a signal occurs rather then the process being stopped. > > If it is indeed hapenning the way I fear, the fix should be easy. The > question is... is it hapenning the way I fear? I think it is happenning but most of the time it is benign because most log term sleeps (that are likely to be hit by ^Z) do not hold a lot of resources across the sleep because they are aware that they may be sleeping or a long while. Certainly they are not holding locked items, just references on vnodes etc. I think it happens but is not as bad as our initial gut reaction felt. > > -Matt > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0201222104310.20356-100000>