Date: Mon, 9 Dec 2002 09:54:54 -0500 (EST) From: alan <alan@pair.com> To: Anton Berezin <tobez@FreeBSD.org> Cc: <arch@FreeBSD.org> Subject: Re: SA_NOCLDWAIT and waitXXX strangeties (Was: Re: ports/45972: Perl system() calls will hang if the process has other forked children.) Message-ID: <Pine.BSF.4.30.0212090938450.95351-100000@smx.pair.com> In-Reply-To: <20021208171847.GE35282@heechee.tobez.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 8 Dec 2002, Anton Berezin wrote: > Alan, > > On Wed, Dec 04, 2002 at 10:30:02AM -0800, alan wrote: > > > The problem is, when you set SA_NOCLDWAIT, subsequent calls to > > wait() (or wait4()) wait for All child processes to exit, not > > just the process ID that wait() is called on. Since Perl's > > system() calls wait4() on its recently forked child, the system() > > call doesn't return until All of the perl process's children > > exit. This doesn't seem like particularly desirable behavior, > > but it is documented in FreeBSD's sigaction man page: > > > > SA_NOCLDWAIT If this bit is set when calling > > sigaction() for the SIGCHLD signal, the system will > > not create zombie processes when children of the > > calling process exit. If the calling process > > subsequently issues a wait(2) (or equivalent), it > > blocks until all of the calling process's child > > processes terminate, and then returns a value of -1 > > with errno set to ECHILD. > > I don't like this behavior. Personally, I would like to think that it > should not wait for all children if any of the waitXXX() explicitly > specifies a pid to wait for. Obviously, there is a difference between > how I would like to interpret the docs and the reality. :-/ So in this > particular instance it looks to me like a bug (or at least, a gray area) > in FreeBSD, which should be discussed, hence the copy to arch-. I agree with all of this. For what it's worth, OS X has the same problem (not surprisingly). The OpenBSD docs for SA_NOCLDWAIT are the same as those for FreeBSD, but I don't have an OpenBSD installation handy to test their interpretation of this gray area. > (The complete description of the problem, for the benefits of arch- > folks, can be found here: > http://www.freebsd.org/cgi/query-pr.cgi?pr=45972) > > > The quick fix is to stop using SA_NOCLDWAIT when it ignores SIGCHLD. > > This may create unwanted zombie processes, though. > > Why would it? IIRC, the BSD semantics for SIG_IGN for SIGCHLD is > exactly `do not create zombies' semantics, so the combination of a perl > program setting SIG_IGN for SIGCHLD and perl itself setting SA_NOCLDWAIT > at the same time looks like an attempt to do the same thing twice. But, > regardless of whether this behavior of perl is a bug or not, I would > still like to hear our signal handling experts on the issue. I may have erroneously interpreted the SA_NOCLDWAIT documentation to imply that if SA_NOCLDWAIT was not set, then zombie processes Would be created. I will try tweaking the SA_NOCLDWAIT out of Perl and seeing what happens. I can only imagine that the SA_NOCLDWAIT code was added to Perl for a reason; so maybe there's Some platform where it is a necessary part of avoiding zombies, even though it's redundant in FreeBSD. > > The better fix is probably not to wait() at all if SIGCHLD is > > currently being ignored with SA_NOCLDWAIT. > > One has to waitXXX() in order to implement a system(3)-like call. > FreeBSD's own system(3) does wait4(), just like perl's system() does. I realized later that this wasn't possible for implementing a synchronous system() call. I wrote a patch for Perl 5.8.0 which has Perl's 'system' temporarily block SIGCHLD and remove the SA_NOCLDWAIT flag if SIGCHLD is being ignored while it calls wait4() on its child. This fixes the symptom in Perl's system, but doesn't help any other case in Perl where SIGCHLD is ignored and then you call wait(). Now that you've said that SA_NOCLDWAIT may not be necessary at all, I'm fairly sure it's not the right solution in any case. > > Below is C code which compiles and runs on FreeBSD using gcc, and > > which demonstrates the difference in behavior when SA_NOCLDWAIT is > > used and is not used. I would like confirmation that this Is in fact > > the expected behavior of FreeBSD, and that in this case Perl is making > > incorrect assumptions about how FreeBSD will behave. > > Exactly. Once you have identified the problem (thanks!), it is easy to > fix perl. The question is that maybe FreeBSD also need fixing. Thank you very much as well! As Andrew Hunt and David Thomas say in "The Pragmatic Programmer:" Tip 26: Select Isn't Broken. As a Perl programmer it's very hard to convince myself that something is Actually wrong with either Perl or the OS. I'm glad that once I was able to convince myself of this, it was easier to convince someone who could do something about it :) Alan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.30.0212090938450.95351-100000>