Date: Mon, 07 Oct 2002 18:44:55 -0400 (EDT) From: John Baldwin <jhb@FreeBSD.org> To: Don Lewis <dl-freebsd@catspoiler.org> Cc: jmallett@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., Message-ID: <XFMail.20021007184455.jhb@FreeBSD.org> In-Reply-To: <200210072123.g97LNGvU033246@gw.catspoiler.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 07-Oct-2002 Don Lewis wrote:
> On 7 Oct, John Baldwin wrote:
>>
>> On 05-Oct-2002 Don Lewis wrote:
>>> On 5 Oct, Juli Mallett wrote:
>>>
>>>> diff -Nrdu -x *CVS* -x *dev* sys/kern/kern_exit.c kernel/kern/kern_exit.c
>>>> --- sys/kern/kern_exit.c Tue Oct 1 12:15:51 2002
>>>> +++ kernel/kern/kern_exit.c Sat Oct 5 01:20:57 2002
>>>
>>>> @@ -209,12 +210,12 @@
>>>> PROC_LOCK(p);
>>>> if (p == p->p_leader) {
>>>> q = p->p_peers;
>>>> + PROC_UNLOCK(p);
>>>> while (q != NULL) {
>>>> - PROC_LOCK(q);
>>>> psignal(q, SIGKILL);
>>>> - PROC_UNLOCK(q);
>>>> q = q->p_peers;
>>>> }
>>>> + PROC_LOCK(p);
>>>> while (p->p_peers)
>>>> msleep(p, &p->p_mtx, PWAIT, "exit1", 0);
>>>> }
>>>
>>> This scary looking fragment of code in exit1() is relying on the lock on
>>> p->p_leader being continuously held to prevent the p_peers list from
>>> changing while the list traversal is in progress. The code in
>>> kern_fork.c and elsewhere in kern_exit.c holds a lock on p_leader while
>>> the list modifications are done.
>>>
>>> The existing code looks like it could deadlock if q is locked because it
>>> is in fork() or exit(). Process p will block when it tries to lock q,
>>> and q will block when it tries to lock its p_leader, which happens to be
>>> p.
>>
>> Ugh. Probably the code should be changed to do something like this:
>>
>> --- kern_exit.c 2 Oct 2002 23:12:01 -0000 1.181
>> +++ kern_exit.c 7 Oct 2002 18:48:18 -0000
>> @@ -203,17 +203,18 @@
>> */
>>
>> p->p_flag |= P_WEXIT;
>> - PROC_UNLOCK(p);
>>
>> /* Are we a task leader? */
>> - PROC_LOCK(p);
>> if (p == p->p_leader) {
>> q = p->p_peers;
>> while (q != NULL) {
>> + nq = q->p_peers;
>> + PROC_UNLOCK(p);
>> PROC_LOCK(q);
>> psignal(q, SIGKILL);
>> PROC_UNLOCK(q);
>> - q = q->p_peers;
>> + PROC_LOCK(p);
>> + q = nq;
>> }
>> while (p->p_peers)
>> msleep(p, &p->p_mtx, PWAIT, "exit1", 0);
>
> It's not obvious to me that your alternative is safe. It avoids the
> deadlock problem, but what keeps the list from changing while it is
> being traversed, especially while we're waiting for PROC_LOCK(q)? It
> separate lock for the peer list (instead of using PROC_LOCK(p_leader))
> looks like the obvious fix. Grabbing the peer list lock after unlocking
> P would avoid the deadlock and allow us to do whatever locking is needed
> for psignal().
Hmm, you are right. Yuck. *sigh* I think we need to check P_WEXIT
in fork1() for this to really DTRT as well.
>> Also, we might should check P_WEXIT and abort in fork1() if it is
>> set. (We don't appear to do that presently.)
>>
>
> Probably, but the list is also modified in the exit code. All those
> processes that we are sending SIGKILL to are removing themselves from
> the list.
Processes dieing from SIGKILL that we send them aren't a problem since
we have already read their p_peers member before we kill them. That's
the point of 'nq'. The problem is that 'nq' could exit and could be
an invalid pointer. If a process later in the list after 'nq' died
that is not a problem either. Well, how about this:
http://www.FreeBSD.org/~jhb/patches/ppeers.patch
--
John Baldwin <jhb@FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!" - http://www.FreeBSD.org/
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.20021007184455.jhb>
