Date: Fri, 10 Apr 2009 22:30:03 GMT From: Jilles Tjoelker <jilles@stack.nl> To: freebsd-bugs@FreeBSD.org Subject: Re: bin/108390: [libc] [patch] wait4() erroneously waits for all children when SIGCHLD is SIG_IGN [regression] Message-ID: <200904102230.n3AMU31L033281@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR bin/108390; it has been noted by GNATS. From: Jilles Tjoelker <jilles@stack.nl> To: bug-followup@FreeBSD.org Cc: Subject: Re: bin/108390: [libc] [patch] wait4() erroneously waits for all children when SIGCHLD is SIG_IGN [regression] Date: Sat, 11 Apr 2009 00:23:56 +0200 Maybe this can be reopened as a change-request. The way wait4() has worked when SA_NOCLDWAIT is set has been a bit strange all since it was introduced in 1997 in http://svn.freebsd.org/viewvc/base?view=revision&revision=29340 The comment there seems to agree with what POSIX says, which is not as useful as it could reasonably be: /* * If this was the last child of our parent, notify * parent, so in case he was wait(2)ing, he will * continue. */ if (LIST_EMPTY(&pp->p_children)) wakeup(pp); However, if you pass wait4() a pid or pgid that matches no child processes, it returns ECHILD immediately, without waiting for any other children to terminate (see kern_wait() in kern_exit.c). Together with signal semantics this leads to strange results. A wait4() on a specific process will normally wait for all child processes to terminate, but if a signal is caught in the meantime, wait4() is restarted and returns ECHILD immediately if all processes matching the argument have already terminated. I have tried this by catching SIGALRM with an empty handler and calling alarm(2) before test_it("IGNORE SIGCHLD"); "P short child finished" happens after 2 seconds, "P waiting for long running child" happens after 8 more seconds. Another way this can lead to strange results is if something else wakes up the proc pointer. It seems this can happen if you use SA_NOCLDWAIT in a multithreaded process, and have one thread wait for a child process and another do a vfork(). The child process from the vfork will wake up the proc pointer to notify that it has execed, and this will wake up both the wait and the vfork. (By the way, vfork() blocking only the calling thread and not the entire process is rather weird considering the original reason for the blocking.) Possible related issue: what if the vfork child did not exec and the wakeup is suppressed; does this freeze the parent until all other children are gone? Perhaps there are other things that can wakeup the proc pointer? I think removing the if (LIST_EMPTY(&pp->p_children)) condition and always doing the wakeup(pp) will yield a more consistent behaviour, which seems more useful for applications. wait4() with SA_NOCLDWAIT will then wait for all matching child processes to terminate and return ECHILD (unless there are still zombies left from a time when SA_NOCLDWAIT was not set). The behaviour described in POSIX is available by specifying any child process (-1). gavin reports that the test program works as the submitter wants (wait4(short_pid, ...) returns immediately after short_pid terminates) on Solaris 10. While doing this, I also noticed a bug in kern_wait(). Ptrace reparents a process to a debugger. When the process exits, the debugger will pick it up in kern_wait() which reparents the process back to its original parent and signals the original parent as for a normal exit. This code does not check for SA_NOCLDWAIT or SIGCHLD being set SIG_IGN, and will leave a zombie anyway. Note that the special handling for SIG_IGN for SIGCHLD in FreeBSD 5 and newer works pretty much the same way as SA_NOCLDWAIT, so it is not much of a factor in the above discussion. -- Jilles Tjoelker
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200904102230.n3AMU31L033281>