Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 7 Oct 2002 14:23:16 -0700 (PDT)
From:      Don Lewis <dl-freebsd@catspoiler.org>
To:        jhb@FreeBSD.ORG
Cc:        arch@FreeBSD.ORG, jmallett@FreeBSD.ORG
Subject:   Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc.,
Message-ID:  <200210072123.g97LNGvU033246@gw.catspoiler.org>
In-Reply-To: <XFMail.20021007145442.jhb@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On  7 Oct, John Baldwin wrote:
> 
> On 05-Oct-2002 Don Lewis wrote:
>> On  5 Oct, Juli Mallett wrote:
>> 
>>> diff -Nrdu -x *CVS* -x *dev* sys/kern/kern_exit.c kernel/kern/kern_exit.c
>>> --- sys/kern/kern_exit.c     Tue Oct  1 12:15:51 2002
>>> +++ kernel/kern/kern_exit.c  Sat Oct  5 01:20:57 2002
>> 
>>> @@ -209,12 +210,12 @@
>>>      PROC_LOCK(p);
>>>      if (p == p->p_leader) {
>>>              q = p->p_peers;
>>> +            PROC_UNLOCK(p);
>>>              while (q != NULL) {
>>> -                    PROC_LOCK(q);
>>>                      psignal(q, SIGKILL);
>>> -                    PROC_UNLOCK(q);
>>>                      q = q->p_peers;
>>>              }
>>> +            PROC_LOCK(p);
>>>              while (p->p_peers) 
>>>                      msleep(p, &p->p_mtx, PWAIT, "exit1", 0);
>>>      }
>> 
>> This scary looking fragment of code in exit1() is relying on the lock on
>> p->p_leader being continuously held to prevent the p_peers list from
>> changing while the list traversal is in progress.  The code in
>> kern_fork.c and elsewhere in kern_exit.c holds a lock on p_leader while
>> the list modifications are done.
>> 
>> The existing code looks like it could deadlock if q is locked because it
>> is in fork() or exit().  Process p will block when it tries to lock q,
>> and q will block when it tries to lock its p_leader, which happens to be
>> p.
> 
> Ugh.  Probably the code should be changed to do something like this:
> 
> --- kern_exit.c 2 Oct 2002 23:12:01 -0000       1.181
> +++ kern_exit.c 7 Oct 2002 18:48:18 -0000
> @@ -203,17 +203,18 @@
>          */
>  
>         p->p_flag |= P_WEXIT;
> -       PROC_UNLOCK(p);
>  
>         /* Are we a task leader? */
> -       PROC_LOCK(p);
>         if (p == p->p_leader) {
>                 q = p->p_peers;
>                 while (q != NULL) {
> +                       nq = q->p_peers;
> +                       PROC_UNLOCK(p);
>                         PROC_LOCK(q);
>                         psignal(q, SIGKILL);
>                         PROC_UNLOCK(q);
> -                       q = q->p_peers;
> +                       PROC_LOCK(p);
> +                       q = nq;
>                 }
>                 while (p->p_peers) 
>                         msleep(p, &p->p_mtx, PWAIT, "exit1", 0);

It's not obvious to me that your alternative is safe.  It avoids the
deadlock problem, but what keeps the list from changing while it is
being traversed, especially while we're waiting for PROC_LOCK(q)?  It
separate lock for the peer list (instead of using PROC_LOCK(p_leader))
looks like the obvious fix. Grabbing the peer list lock after unlocking
P would avoid the deadlock and allow us to do whatever locking is needed
for psignal().

> Also, we might should check P_WEXIT and abort in fork1() if it is
> set.  (We don't appear to do that presently.)
> 

Probably, but the list is also modified in the exit code.  All those
processes that we are sending SIGKILL to are removing themselves from
the list.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200210072123.g97LNGvU033246>