From owner-freebsd-arch Tue Jan 22 21:20:14 2002 Delivered-To: freebsd-arch@freebsd.org Received: from rwcrmhc52.attbi.com (rwcrmhc52.attbi.com [216.148.227.88]) by hub.freebsd.org (Postfix) with ESMTP id 36C8737B400 for ; Tue, 22 Jan 2002 21:20:11 -0800 (PST) Received: from InterJet.elischer.org ([12.232.206.8]) by rwcrmhc52.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20020123052010.QMJV3578.rwcrmhc52.attbi.com@InterJet.elischer.org>; Wed, 23 Jan 2002 05:20:10 +0000 Received: from localhost (localhost.elischer.org [127.0.0.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id VAA20434; Tue, 22 Jan 2002 21:10:20 -0800 (PST) Date: Tue, 22 Jan 2002 21:10:19 -0800 (PST) From: Julian Elischer To: Matthew Dillon Cc: Alfred Perlstein , arch@freebsd.org, Bruce Evans , David Greenman Subject: Re: PCATCH vs signal(SIGSTOP) (was Re: STOP and SLEEP in the kernel) In-Reply-To: <200201230442.g0N4gCh03552@apollo.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Tue, 22 Jan 2002, Matthew Dillon wrote: > > : > :* Matthew Dillon [020122 19:30] wrote: > :> What really freaks me out is that if t/msleep() is called with PCATCH, > :> it appears to process a STOP signal right then and there and actually > :> stop the process rather then return. t/msleep() is called all over the > :> place with PCATCH while holding vnode and other lockmgr locks so a ^Z > :> at the wrong point could deadlock the system. > :> > :> "That can't be right" I said to myself and to Julian, but neither of us > :> can see where the code might do something else. As far as I can tell the > :> existing -stable and -current code *will* in fact STOP the process > :> while potentially holding (a vnode lock for example). There is a whole > :> lot of code, especially in NFS, that uses PCATCH. It can't be right. > : > :*ARRRRRRGH* > : > :Obviously STOP signals should only be honoured in userret and signal > :entry points. Any chance for a fix? > : > :-- > :-Alfred Perlstein [alfred@freebsd.org] > What I've done in the KSE code is to remove the mi_switch etc from issignal() and add a separate function in userret() that checks for a STOPPED condition on the process and does the mi_switch() then. It seems to work but I seem to have screwed up the restarting code.. :-) I'll hopefully have that fixed tonight. > I'm still not sure it even happens, but I can't find any code to > prevent it. > > I would like somebody whos played with the signal code before, like > BDE or DG, to take a look at the STOP/PCATCH handling. > > For those I've just added to the CC: Julian and I were looking at > the STOP signal handling code and it appears that a tsleep()/msleep() > called with PCATCH can cause the process to go into a STOPped state if > it is signaled at just that moment, leaving held locks in place and > potentially deadlocking the system. There is a whole lot of code in > the kernel that uses PCATCH and assumes that tsleep()/msleep() will > return when a signal occurs rather then the process being stopped. > > If it is indeed hapenning the way I fear, the fix should be easy. The > question is... is it hapenning the way I fear? I think it is happenning but most of the time it is benign because most log term sleeps (that are likely to be hit by ^Z) do not hold a lot of resources across the sleep because they are aware that they may be sleeping or a long while. Certainly they are not holding locked items, just references on vnodes etc. I think it happens but is not as bad as our initial gut reaction felt. > > -Matt > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-arch" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message