Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 26 May 2003 22:18:03 -0700 (PDT)
From:      Don Lewis <truckman@FreeBSD.org>
To:        DougB@FreeBSD.org
Cc:        freebsd-current@FreeBSD.org
Subject:   Re: mtv leaves a zombie after exit
Message-ID:  <200305270518.h4R5I3M7088200@gw.catspoiler.org>
In-Reply-To: <200305130601.h4D612M7049395@gw.catspoiler.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 12 May, To: DougB@freebsd.org wrote:
> On 12 May, Doug Barton wrote:
>> On Mon, 12 May 2003, Terry Lambert wrote:
>> 
>>> A "ps -gaxl" will print the wait channel, which may be more
>>> informative.
>> 
>>  UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT  TT    TIME COMMAND
>> 1000   0    1   0 -84  0   0   0 -      ZW    p4 0:00.00  (mtvp)
>> 
>> BTW, inre your question about the shell, it's bash. But, I get the exact
>> same results if mtv is started as a child of the shell, as a child of
>> windowmaker, or as a child of netscape.
> 
> Does this application use Linux threads?  The following code in wait1()
> makes me think that if a thread somehow gets orphaned by the parent
> Linux process, it will never get reaped.  The exit code for Linux should
> probably wait for any child threads to exit.
> 
>         LIST_FOREACH(p, &q->p_children, p_sibling) {
>                 PROC_LOCK(p);
>                 if (uap->pid != WAIT_ANY &&
>                     p->p_pid != uap->pid && p->p_pgid != -uap->pid) {
>                         PROC_UNLOCK(p);
>                         continue;
>                 }
> 
>                 /*
>                  * This special case handles a kthread spawned by linux_clone
>                  * (see linux_misc.c).  The linux_wait4 and linux_waitpid
>                  * functions need to be able to distinguish between waiting
>                  * on a process and waiting on a thread.  It is a thread if
>                  * p_sigparent is not SIGCHLD, and the WLINUXCLONE option
>                  * signifies we want to wait for threads and not processes.
>                  */
>                 if ((p->p_sigparent != SIGCHLD) ^
>                     ((uap->options & WLINUXCLONE) != 0)) {
>                         PROC_UNLOCK(p);
>                         continue;
>                 }

I did some more digging, and it doesn't look like the p_sigparent test
should be causing problems, at least in the normal case, because of the
following code in exit1():

        sx_xlock(&proctree_lock);
        q = LIST_FIRST(&p->p_children);
        if (q != NULL)          /* only need this if any child is S_ZOMB */
                wakeup(initproc);
        for (; q != NULL; q = nq) {
                nq = LIST_NEXT(q, p_sibling);
                PROC_LOCK(q);
                proc_reparent(q, initproc);
                q->p_sigparent = SIGCHLD;
                /*
                 * Traced processes are killed
                 * since their existence means someone is screwing up.
                 */
                if (q->p_flag & P_TRACED) {
                        q->p_flag &= ~P_TRACED;
                        psignal(q, SIGKILL);
                }
                PROC_UNLOCK(q);
        }

If the parent process exits while it has outstanding Linux threads, it
should change p_sigparent to SIGCHLD, which should allow init to reap
it.  The locking also looks ok to me.

There are still a couple of possibilities.  The first is the following
code in kern_ptrace() that handles PT_DETACH:

                if (req == PT_DETACH) {
                        /* reset process parent */
                        if (p->p_oppid != p->p_pptr->p_pid) {
                                struct proc *pp;

                                PROC_UNLOCK(p);
                                pp = pfind(p->p_oppid);
                                if (pp == NULL)
                                        pp = initproc;
                                else
                                        PROC_UNLOCK(pp);
                                PROC_LOCK(p);
                                proc_reparent(p, pp);
                        }
                        p->p_flag &= ~(P_TRACED | P_WAITED);
                        p->p_oppid = 0;

                        /* should we send SIGCHLD? */
                }

If the Linux thread in question were being traced when the parent
process exited, it looks like the thread could get reparented to init
without having p_sigparent set to SIGCHLD.

A more likely cause of the problem is this code in exit1():

        /*
         * Notify parent that we're gone.  If parent has the PS_NOCLDWAIT
         * flag set, or if the handler is set to SIG_IGN, notify process
         * 1 instead (and hope it will handle this situation).
         */
        PROC_LOCK(p->p_pptr);
        mtx_lock(&p->p_pptr->p_sigacts->ps_mtx);
        if (p->p_pptr->p_sigacts->ps_flag & (PS_NOCLDWAIT | PS_CLDSIGIGN)) {
                struct proc *pp;
                
                mtx_unlock(&p->p_pptr->p_sigacts->ps_mtx);
                pp = p->p_pptr;
                PROC_UNLOCK(pp);
                proc_reparent(p, initproc);
                PROC_LOCK(p->p_pptr);
                /*
                 * If this was the last child of our parent, notify
                 * parent, so in case he was wait(2)ing, he will
                 * continue.
                 */
                if (LIST_EMPTY(&pp->p_children)) 
                        wakeup(pp);

If the parent process of the Linux thread set its SIGCHLD handler to
SIG_IGN, then when the Linux thread exited, it would be reparented to
init but its p_sigparent would not be set to SIGCHLD, and init would not
be able to reap the thread.

As a quick and dirty test, try the patch below.  One thing I'm not sure
about is how Linux threads and SIG_IGN should really mix.

Index: sys/kern/kern_exit.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_exit.c,v
retrieving revision 1.214
diff -u -r1.214 kern_exit.c
--- sys/kern/kern_exit.c	13 May 2003 20:35:59 -0000	1.214
+++ sys/kern/kern_exit.c	27 May 2003 04:39:19 -0000
@@ -439,6 +439,7 @@
 		pp = p->p_pptr;
 		PROC_UNLOCK(pp);
 		proc_reparent(p, initproc);
+		p->p_sigparent = SIGCHLD;
 		PROC_LOCK(p->p_pptr);
 		/*
 		 * If this was the last child of our parent, notify
Index: sys/kern/sys_process.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/sys_process.c,v
retrieving revision 1.108
diff -u -r1.108 sys_process.c
--- sys/kern/sys_process.c	25 Apr 2003 20:02:16 -0000	1.108
+++ sys/kern/sys_process.c	27 May 2003 04:39:42 -0000
@@ -587,6 +587,7 @@
 					PROC_UNLOCK(pp);
 				PROC_LOCK(p);
 				proc_reparent(p, pp);
+				p->p_sigparent = SIGCHLD;
 			}
 			p->p_flag &= ~(P_TRACED | P_WAITED);
 			p->p_oppid = 0;



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200305270518.h4R5I3M7088200>