From owner-freebsd-hackers@FreeBSD.ORG Wed Feb 10 21:36:22 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0644F1065696 for ; Wed, 10 Feb 2010 21:36:22 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay04.stack.nl [IPv6:2001:610:1108:5010::107]) by mx1.freebsd.org (Postfix) with ESMTP id A855E8FC20 for ; Wed, 10 Feb 2010 21:36:21 +0000 (UTC) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id A82C91DD416; Wed, 10 Feb 2010 22:36:20 +0100 (CET) Received: by snail.stack.nl (Postfix, from userid 1677) id 948F9228BE; Wed, 10 Feb 2010 22:36:20 +0100 (CET) Date: Wed, 10 Feb 2010 22:36:20 +0100 From: Jilles Tjoelker To: Naveen Gujje Message-ID: <20100210213620.GA94346@stack.nl> References: <39c945731002092314u4a8fd100q69c0735a11e9063a@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <39c945731002092314u4a8fd100q69c0735a11e9063a@mail.gmail.com> User-Agent: Mutt/1.5.18 (2008-05-17) Cc: freebsd-hackers@freebsd.org Subject: Re: System() returning ECHILD error on FreeBSD 7.2 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Feb 2010 21:36:22 -0000 On Wed, Feb 10, 2010 at 12:44:57PM +0530, Naveen Gujje wrote: > [SIGCHLD handler that calls waitpid()] > And, in some other part of the code, we call system() to add an ethernet > interface. This system() call is returning -1 with errno set to ECHILD, > though the passed command is executed successfully. I have noticed that, > the problem is observed only after we register SigChildHandler. If I have a > simple statement like system("ls") before and after the call to > signal(SIGCHLD, SigChildHandler), the call before setting signal handler > succeeds without errors and the call after setting signal handler returns -1 > with errno set to ECHILD. > Here, I believe that within the system() call, the child exited before the > parent got a chance to call _wait4 and thus resulted in ECHILD error. But, > for the child to exit without notifying the parent, SIGCHLD has to be set to > SIG_IGN in the parent and this is not the case, because we are already > setting it to SigChildHandler. If I set SIGCHLD to SIG_DFL before calling > system() then i don't see this problem. > I would like to know how setting SIGCHLD to SIG_DFL or SigChildHanlder is > making the difference. I think your process is multi-threaded. In a single-threaded process, system()'s signal masking will ensure it will reap the zombie, leaving the signal handler with nothing (in fact, as of FreeBSD 7.0 it will not be called at all unless there are other child processes). In a multi-threaded process, each thread has its own signal mask and system() can only affect its own thread's signal mask. If another thread has SIGCHLD unblocked, the signal handler will race with system() trying to call waitpid() first. Possible fixes are using siginfo_t information to only waitpid() child processes you know about, setting up the signal masks so the bad situation does not occur (note that the signal mask is inherited across pthread_create()) and calling fork/execve and managing the child process exit yourself. Note that POSIX does not require system() to be thread-safe. -- Jilles Tjoelker