Date: Mon, 07 Oct 2002 18:44:55 -0400 (EDT) From: John Baldwin <jhb@FreeBSD.org> To: Don Lewis <dl-freebsd@catspoiler.org> Cc: jmallett@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: [jmallett@FreeBSD.org: [PATCH] Reliable signal queues, etc., Message-ID: <XFMail.20021007184455.jhb@FreeBSD.org> In-Reply-To: <200210072123.g97LNGvU033246@gw.catspoiler.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 07-Oct-2002 Don Lewis wrote: > On 7 Oct, John Baldwin wrote: >> >> On 05-Oct-2002 Don Lewis wrote: >>> On 5 Oct, Juli Mallett wrote: >>> >>>> diff -Nrdu -x *CVS* -x *dev* sys/kern/kern_exit.c kernel/kern/kern_exit.c >>>> --- sys/kern/kern_exit.c Tue Oct 1 12:15:51 2002 >>>> +++ kernel/kern/kern_exit.c Sat Oct 5 01:20:57 2002 >>> >>>> @@ -209,12 +210,12 @@ >>>> PROC_LOCK(p); >>>> if (p == p->p_leader) { >>>> q = p->p_peers; >>>> + PROC_UNLOCK(p); >>>> while (q != NULL) { >>>> - PROC_LOCK(q); >>>> psignal(q, SIGKILL); >>>> - PROC_UNLOCK(q); >>>> q = q->p_peers; >>>> } >>>> + PROC_LOCK(p); >>>> while (p->p_peers) >>>> msleep(p, &p->p_mtx, PWAIT, "exit1", 0); >>>> } >>> >>> This scary looking fragment of code in exit1() is relying on the lock on >>> p->p_leader being continuously held to prevent the p_peers list from >>> changing while the list traversal is in progress. The code in >>> kern_fork.c and elsewhere in kern_exit.c holds a lock on p_leader while >>> the list modifications are done. >>> >>> The existing code looks like it could deadlock if q is locked because it >>> is in fork() or exit(). Process p will block when it tries to lock q, >>> and q will block when it tries to lock its p_leader, which happens to be >>> p. >> >> Ugh. Probably the code should be changed to do something like this: >> >> --- kern_exit.c 2 Oct 2002 23:12:01 -0000 1.181 >> +++ kern_exit.c 7 Oct 2002 18:48:18 -0000 >> @@ -203,17 +203,18 @@ >> */ >> >> p->p_flag |= P_WEXIT; >> - PROC_UNLOCK(p); >> >> /* Are we a task leader? */ >> - PROC_LOCK(p); >> if (p == p->p_leader) { >> q = p->p_peers; >> while (q != NULL) { >> + nq = q->p_peers; >> + PROC_UNLOCK(p); >> PROC_LOCK(q); >> psignal(q, SIGKILL); >> PROC_UNLOCK(q); >> - q = q->p_peers; >> + PROC_LOCK(p); >> + q = nq; >> } >> while (p->p_peers) >> msleep(p, &p->p_mtx, PWAIT, "exit1", 0); > > It's not obvious to me that your alternative is safe. It avoids the > deadlock problem, but what keeps the list from changing while it is > being traversed, especially while we're waiting for PROC_LOCK(q)? It > separate lock for the peer list (instead of using PROC_LOCK(p_leader)) > looks like the obvious fix. Grabbing the peer list lock after unlocking > P would avoid the deadlock and allow us to do whatever locking is needed > for psignal(). Hmm, you are right. Yuck. *sigh* I think we need to check P_WEXIT in fork1() for this to really DTRT as well. >> Also, we might should check P_WEXIT and abort in fork1() if it is >> set. (We don't appear to do that presently.) >> > > Probably, but the list is also modified in the exit code. All those > processes that we are sending SIGKILL to are removing themselves from > the list. Processes dieing from SIGKILL that we send them aren't a problem since we have already read their p_peers member before we kill them. That's the point of 'nq'. The problem is that 'nq' could exit and could be an invalid pointer. If a process later in the list after 'nq' died that is not a problem either. Well, how about this: http://www.FreeBSD.org/~jhb/patches/ppeers.patch -- John Baldwin <jhb@FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.20021007184455.jhb>