Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Jan 2002 21:10:19 -0800 (PST)
From:      Julian Elischer <julian@elischer.org>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        Alfred Perlstein <bright@mu.org>, arch@freebsd.org, Bruce Evans <bde@zeta.org.au>, David Greenman <dg@root.com>
Subject:   Re: PCATCH vs signal(SIGSTOP) (was Re: STOP and SLEEP in the kernel)
Message-ID:  <Pine.BSF.4.21.0201222104310.20356-100000@InterJet.elischer.org>
In-Reply-To: <200201230442.g0N4gCh03552@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 22 Jan 2002, Matthew Dillon wrote:

> 
> :
> :* Matthew Dillon <dillon@apollo.backplane.com> [020122 19:30] wrote:
> :>     What really freaks me out is that if t/msleep() is called with PCATCH,
> :>     it appears to process a STOP signal right then and there and actually
> :>     stop the process rather then return.  t/msleep() is called all over the
> :>     place with PCATCH while holding vnode and other lockmgr locks so a ^Z
> :>     at the wrong point could deadlock the system.
> :> 
> :>     "That can't be right" I said to myself and to Julian, but neither of us
> :>     can see where the code might do something else.  As far as I can tell the
> :>     existing -stable and -current code *will* in fact STOP the process
> :>     while potentially holding (a vnode lock for example).  There is a whole
> :>     lot of code, especially in NFS, that uses PCATCH.  It can't be right.
> :
> :*ARRRRRRGH*
> :
> :Obviously STOP signals should only be honoured in userret and signal
> :entry points.  Any chance for a fix?
> :
> :-- 
> :-Alfred Perlstein [alfred@freebsd.org]
> 

What I've done in the KSE code is to remove the mi_switch
etc from issignal() and add a separate function in
userret() that checks for a STOPPED condition on the process and
does the mi_switch() then. It seems to work
but I seem to have screwed up the restarting code.. :-)
I'll hopefully have that fixed tonight.


>     I'm still not sure it even happens, but I can't find any code to
>     prevent it.  
> 
>     I would like somebody whos played with the signal code before, like
>     BDE or DG, to take a look at the STOP/PCATCH handling.
> 
>     For those I've just added to the CC:   Julian and I were looking at
>     the STOP signal handling code and it appears that a tsleep()/msleep()
>     called with PCATCH can cause the process to go into a STOPped state if
>     it is signaled at just that moment, leaving held locks in place and
>     potentially deadlocking the system.  There is a whole lot of code in
>     the kernel that uses PCATCH and assumes that tsleep()/msleep() will
>     return when a signal occurs rather then the process being stopped.
> 
>     If it is indeed hapenning the way I fear, the fix should be easy.  The
>     question is... is it hapenning the way I fear?

I think it is happenning but most of the time it is benign because
most log term sleeps (that are likely to be hit by ^Z) do not hold a lot
of resources across the sleep because they are aware that they may be
sleeping or a long while. Certainly they are not holding locked items,
just references on vnodes etc.

I think it happens but is not as bad as our initial gut reaction felt.




> 
> 						-Matt
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-arch" in the body of the message
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0201222104310.20356-100000>