From owner-freebsd-current@FreeBSD.ORG Mon May 26 22:18:12 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E457137B401; Mon, 26 May 2003 22:18:12 -0700 (PDT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 81D7143F3F; Mon, 26 May 2003 22:18:11 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.9/8.12.9) with ESMTP id h4R5I3M7088200; Mon, 26 May 2003 22:18:07 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200305270518.h4R5I3M7088200@gw.catspoiler.org> Date: Mon, 26 May 2003 22:18:03 -0700 (PDT) From: Don Lewis To: DougB@FreeBSD.org In-Reply-To: <200305130601.h4D612M7049395@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: freebsd-current@FreeBSD.org Subject: Re: mtv leaves a zombie after exit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2003 05:18:13 -0000 On 12 May, To: DougB@freebsd.org wrote: > On 12 May, Doug Barton wrote: >> On Mon, 12 May 2003, Terry Lambert wrote: >> >>> A "ps -gaxl" will print the wait channel, which may be more >>> informative. >> >> UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND >> 1000 0 1 0 -84 0 0 0 - ZW p4 0:00.00 (mtvp) >> >> BTW, inre your question about the shell, it's bash. But, I get the exact >> same results if mtv is started as a child of the shell, as a child of >> windowmaker, or as a child of netscape. > > Does this application use Linux threads? The following code in wait1() > makes me think that if a thread somehow gets orphaned by the parent > Linux process, it will never get reaped. The exit code for Linux should > probably wait for any child threads to exit. > > LIST_FOREACH(p, &q->p_children, p_sibling) { > PROC_LOCK(p); > if (uap->pid != WAIT_ANY && > p->p_pid != uap->pid && p->p_pgid != -uap->pid) { > PROC_UNLOCK(p); > continue; > } > > /* > * This special case handles a kthread spawned by linux_clone > * (see linux_misc.c). The linux_wait4 and linux_waitpid > * functions need to be able to distinguish between waiting > * on a process and waiting on a thread. It is a thread if > * p_sigparent is not SIGCHLD, and the WLINUXCLONE option > * signifies we want to wait for threads and not processes. > */ > if ((p->p_sigparent != SIGCHLD) ^ > ((uap->options & WLINUXCLONE) != 0)) { > PROC_UNLOCK(p); > continue; > } I did some more digging, and it doesn't look like the p_sigparent test should be causing problems, at least in the normal case, because of the following code in exit1(): sx_xlock(&proctree_lock); q = LIST_FIRST(&p->p_children); if (q != NULL) /* only need this if any child is S_ZOMB */ wakeup(initproc); for (; q != NULL; q = nq) { nq = LIST_NEXT(q, p_sibling); PROC_LOCK(q); proc_reparent(q, initproc); q->p_sigparent = SIGCHLD; /* * Traced processes are killed * since their existence means someone is screwing up. */ if (q->p_flag & P_TRACED) { q->p_flag &= ~P_TRACED; psignal(q, SIGKILL); } PROC_UNLOCK(q); } If the parent process exits while it has outstanding Linux threads, it should change p_sigparent to SIGCHLD, which should allow init to reap it. The locking also looks ok to me. There are still a couple of possibilities. The first is the following code in kern_ptrace() that handles PT_DETACH: if (req == PT_DETACH) { /* reset process parent */ if (p->p_oppid != p->p_pptr->p_pid) { struct proc *pp; PROC_UNLOCK(p); pp = pfind(p->p_oppid); if (pp == NULL) pp = initproc; else PROC_UNLOCK(pp); PROC_LOCK(p); proc_reparent(p, pp); } p->p_flag &= ~(P_TRACED | P_WAITED); p->p_oppid = 0; /* should we send SIGCHLD? */ } If the Linux thread in question were being traced when the parent process exited, it looks like the thread could get reparented to init without having p_sigparent set to SIGCHLD. A more likely cause of the problem is this code in exit1(): /* * Notify parent that we're gone. If parent has the PS_NOCLDWAIT * flag set, or if the handler is set to SIG_IGN, notify process * 1 instead (and hope it will handle this situation). */ PROC_LOCK(p->p_pptr); mtx_lock(&p->p_pptr->p_sigacts->ps_mtx); if (p->p_pptr->p_sigacts->ps_flag & (PS_NOCLDWAIT | PS_CLDSIGIGN)) { struct proc *pp; mtx_unlock(&p->p_pptr->p_sigacts->ps_mtx); pp = p->p_pptr; PROC_UNLOCK(pp); proc_reparent(p, initproc); PROC_LOCK(p->p_pptr); /* * If this was the last child of our parent, notify * parent, so in case he was wait(2)ing, he will * continue. */ if (LIST_EMPTY(&pp->p_children)) wakeup(pp); If the parent process of the Linux thread set its SIGCHLD handler to SIG_IGN, then when the Linux thread exited, it would be reparented to init but its p_sigparent would not be set to SIGCHLD, and init would not be able to reap the thread. As a quick and dirty test, try the patch below. One thing I'm not sure about is how Linux threads and SIG_IGN should really mix. Index: sys/kern/kern_exit.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_exit.c,v retrieving revision 1.214 diff -u -r1.214 kern_exit.c --- sys/kern/kern_exit.c 13 May 2003 20:35:59 -0000 1.214 +++ sys/kern/kern_exit.c 27 May 2003 04:39:19 -0000 @@ -439,6 +439,7 @@ pp = p->p_pptr; PROC_UNLOCK(pp); proc_reparent(p, initproc); + p->p_sigparent = SIGCHLD; PROC_LOCK(p->p_pptr); /* * If this was the last child of our parent, notify Index: sys/kern/sys_process.c =================================================================== RCS file: /home/ncvs/src/sys/kern/sys_process.c,v retrieving revision 1.108 diff -u -r1.108 sys_process.c --- sys/kern/sys_process.c 25 Apr 2003 20:02:16 -0000 1.108 +++ sys/kern/sys_process.c 27 May 2003 04:39:42 -0000 @@ -587,6 +587,7 @@ PROC_UNLOCK(pp); PROC_LOCK(p); proc_reparent(p, pp); + p->p_sigparent = SIGCHLD; } p->p_flag &= ~(P_TRACED | P_WAITED); p->p_oppid = 0;