Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 07 Oct 2002 18:44:55 -0400 (EDT)
From:      John Baldwin <jhb@FreeBSD.org>
To:        Don Lewis <dl-freebsd@catspoiler.org>
Cc:        jmallett@FreeBSD.ORG, arch@FreeBSD.ORG
Subject:   Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc.,
Message-ID:  <XFMail.20021007184455.jhb@FreeBSD.org>
In-Reply-To: <200210072123.g97LNGvU033246@gw.catspoiler.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On 07-Oct-2002 Don Lewis wrote:
> On  7 Oct, John Baldwin wrote:
>> 
>> On 05-Oct-2002 Don Lewis wrote:
>>> On  5 Oct, Juli Mallett wrote:
>>> 
>>>> diff -Nrdu -x *CVS* -x *dev* sys/kern/kern_exit.c kernel/kern/kern_exit.c
>>>> --- sys/kern/kern_exit.c     Tue Oct  1 12:15:51 2002
>>>> +++ kernel/kern/kern_exit.c  Sat Oct  5 01:20:57 2002
>>> 
>>>> @@ -209,12 +210,12 @@
>>>>      PROC_LOCK(p);
>>>>      if (p == p->p_leader) {
>>>>              q = p->p_peers;
>>>> +            PROC_UNLOCK(p);
>>>>              while (q != NULL) {
>>>> -                    PROC_LOCK(q);
>>>>                      psignal(q, SIGKILL);
>>>> -                    PROC_UNLOCK(q);
>>>>                      q = q->p_peers;
>>>>              }
>>>> +            PROC_LOCK(p);
>>>>              while (p->p_peers) 
>>>>                      msleep(p, &p->p_mtx, PWAIT, "exit1", 0);
>>>>      }
>>> 
>>> This scary looking fragment of code in exit1() is relying on the lock on
>>> p->p_leader being continuously held to prevent the p_peers list from
>>> changing while the list traversal is in progress.  The code in
>>> kern_fork.c and elsewhere in kern_exit.c holds a lock on p_leader while
>>> the list modifications are done.
>>> 
>>> The existing code looks like it could deadlock if q is locked because it
>>> is in fork() or exit().  Process p will block when it tries to lock q,
>>> and q will block when it tries to lock its p_leader, which happens to be
>>> p.
>> 
>> Ugh.  Probably the code should be changed to do something like this:
>> 
>> --- kern_exit.c 2 Oct 2002 23:12:01 -0000       1.181
>> +++ kern_exit.c 7 Oct 2002 18:48:18 -0000
>> @@ -203,17 +203,18 @@
>>          */
>>  
>>         p->p_flag |= P_WEXIT;
>> -       PROC_UNLOCK(p);
>>  
>>         /* Are we a task leader? */
>> -       PROC_LOCK(p);
>>         if (p == p->p_leader) {
>>                 q = p->p_peers;
>>                 while (q != NULL) {
>> +                       nq = q->p_peers;
>> +                       PROC_UNLOCK(p);
>>                         PROC_LOCK(q);
>>                         psignal(q, SIGKILL);
>>                         PROC_UNLOCK(q);
>> -                       q = q->p_peers;
>> +                       PROC_LOCK(p);
>> +                       q = nq;
>>                 }
>>                 while (p->p_peers) 
>>                         msleep(p, &p->p_mtx, PWAIT, "exit1", 0);
> 
> It's not obvious to me that your alternative is safe.  It avoids the
> deadlock problem, but what keeps the list from changing while it is
> being traversed, especially while we're waiting for PROC_LOCK(q)?  It
> separate lock for the peer list (instead of using PROC_LOCK(p_leader))
> looks like the obvious fix. Grabbing the peer list lock after unlocking
> P would avoid the deadlock and allow us to do whatever locking is needed
> for psignal().

Hmm, you are right.  Yuck.  *sigh*  I think we need to check P_WEXIT
in fork1() for this to really DTRT as well.

>> Also, we might should check P_WEXIT and abort in fork1() if it is
>> set.  (We don't appear to do that presently.)
>> 
> 
> Probably, but the list is also modified in the exit code.  All those
> processes that we are sending SIGKILL to are removing themselves from
> the list.

Processes dieing from SIGKILL that we send them aren't a problem since
we have already read their p_peers member before we kill them.  That's
the point of 'nq'.  The problem is that 'nq' could exit and could be
an invalid pointer.  If a process later in the list after 'nq' died
that is not a problem either.  Well, how about this:

http://www.FreeBSD.org/~jhb/patches/ppeers.patch

-- 

John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.20021007184455.jhb>