Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Feb 2010 09:52:06 -0800
From:      Garrett Cooper <yanefbsd@gmail.com>
To:        Naveen Gujje <gujjenaveen@gmail.com>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: System() returning ECHILD error on FreeBSD 7.2
Message-ID:  <7d6fde3d1002100952g1518bc36r371020260e81a8c3@mail.gmail.com>
In-Reply-To: <39c945731002100925i2e466768peac89cdef15463f2@mail.gmail.com>
References:  <39c945731002100925i2e466768peac89cdef15463f2@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Feb 10, 2010 at 9:25 AM, Naveen Gujje <gujjenaveen@gmail.com> wrote=
:
> Naveen Gujje <gujjenaveen at gmail.com
> <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>>; wrote:
> =A0>> signal(SIGCHLD, SigChildHandler);
> =A0>>
> =A0>> void
> =A0>> SigChildHandler(int sig)
>
> =A0>> {
> =A0>> =A0 pid_t pid;
> =A0>>
> =A0>> =A0 /* get status of all dead procs */
> =A0>> =A0 do {
> =A0>> =A0 =A0 int procstat;
> =A0>> =A0 =A0 pid =3D waitpid(-1, &procstat, WNOHANG);
> =A0>> =A0 =A0 if (pid < 0) {
>
> =A0>> =A0 =A0 =A0 if (errno =3D=3D EINTR)
> =A0>> =A0 =A0 =A0 =A0 continue; =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* ignore it =
*/
> =A0>> =A0 =A0 =A0 else {
> =A0>> =A0 =A0 =A0 =A0 if (errno !=3D ECHILD)
> =A0>> =A0 =A0 =A0 =A0 =A0 perror("getting waitpid");
>
> =A0>> =A0 =A0 =A0 =A0 pid =3D 0; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* break =
out */
> =A0>> =A0 =A0 =A0 }
> =A0>> =A0 =A0 }
> =A0>> =A0 =A0 else if (pid !=3D 0)
> =A0>> =A0 =A0 =A0 syslog(LOG_INFO, "child process %d completed", (int) pi=
d);
>
> =A0>> =A0 } while (pid);
> =A0>>
> =A0>> =A0 signal(SIGCHLD, SigChildHandler);
> =A0>> }
>
>>There are several problems with your signal handler.
>
>>First, the perror() and syslog() functions are not re-entrant,
>
>>so they should not be used inside signal handlers. =A0This can
>>lead to undefined behaviour. =A0Please refer to the sigaction(2)
>>manual page for a list of functions that are considered safe
>>to be used inside signal handlers.
>
>>Second, you are using functions that may change the value of
>>the global errno variable. =A0Therefore you must save its value
>>at the beginning of the signal handler, and restore it at the
>>end.
>
>>Third (not a problem in this particular case, AFAICT, but
>>still good to know): =A0Unlike SysV systems, BSD systems do
>>_not_ automatically reset the signal action when the handler
>>is called. =A0Therefore you do not have to call signal() again
>
>>in the handler (but it shouldn't hurt either). =A0Because of
>>the semantic difference of the signal() function on different
>>systems, it is preferable to use sigaction(2) instead in
>>portable code.
>
> Okay, I followed your suggestion and changed my SigChildHandler to
>
> void
> SigChildHandler(int sig)
> {
> =A0pid_t pid;
> =A0int status;
> =A0int saved_errno =3D errno;
>
> =A0while (((pid =3D waitpid( (pid_t) -1, &status, WNOHANG)) > 0) ||
>
> =A0 =A0 =A0 =A0 ((-1 =3D=3D pid) && (EINTR =3D=3D errno)))
> =A0 =A0;
>
> =A0errno =3D saved_errno;
> }
>
> and used sigaction(2) to register this handler. Still, system(3) returns
> -1 with errno set to ECHILD.
>
> =A0>> And, in some other part of the code, we call system() to add an eth=
ernet
>
> =A0>> interface. This system() call is returning -1 with errno set to ECH=
ILD,
> =A0>> though the passed command is executed successfully. =A0I have notic=
ed that,
> =A0>> the problem is observed only after we register SigChildHandler. If =
I have a
>
> =A0>> simple statement like system("ls") before and after the call to
> =A0>> signal(SIGCHLD, SigChildHandler), the call before setting signal ha=
ndler
> =A0>> succeeds without errors and the call after setting signal handler r=
eturns -1
>
> =A0>> with errno set to ECHILD.
> =A0>>
> =A0>> Here, I believe that within the system() call, the child exited bef=
ore the
> =A0>> parent got a chance to call _wait4 and thus resulted in ECHILD erro=
r.
>
>>I don't think that can happen.
>
> =A0>> But, for the child to exit without notifying the parent, SIGCHLD ha=
s to be
> =A0>> set to SIG_IGN in the parent and this is not the case, because we
> are already
>
> =A0>> setting it to SigChildHandler. If I set SIGCHLD to SIG_DFL before c=
alling
> =A0>> system() then i don't see this problem.
> =A0>>
> =A0>> I would like to know how setting SIGCHLD to SIG_DFL or SigChildHanl=
der is
>
> =A0>> making the difference.
>
>>The system() function temporarily blocks SIGCHLD (i.e. it
>>adds the signal to the process' signal mask). =A0However,
>>blocking is different from ignoring: =A0The signal is held
>
>>as long as it is blocked, and as soon as it is removed
>>from the mask, it is delivered, i.e. your signal handler
>>is called right before the system() function returns.
>
> Yes, I agree with you. Here, I believe, the point in blocking SIGCHLD
> is to give preference to wait4() of system() over any other waitXXX() in
> parent process. But I still cant get the reason for wait4() to return -1.
>
>>And since you don't save the errno value, your signal
>>handler overwrites the value returned from the system()
>>function. =A0So you get ECHILD.
>
> I had a debug print just after wait4() in system() and before we unblock
> SIGCHLD. And it's clear that wait4() is returning -1 with errno as ECHILD=
.

    Isn't this section of the system(3) libcall essentially doing what
you want, s.t. you'll never be able to get the process status when you
call waitpid(2)?

       do {
           pid =3D _wait4(savedpid, &pstat, 0, (struct rusage *)0);
       } while (pid =3D=3D -1 && errno =3D=3D EINTR);
       break;

    You typically get status via wait*(2) when using exec*(2) or via
the return codes from system(3), not system(3) with wait*(2)...
Thanks,
-Garrett



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7d6fde3d1002100952g1518bc36r371020260e81a8c3>