Date: Thu, 30 Sep 2004 20:29:32 -0500 From: Dan Nelson <dnelson@allantgroup.com> To: Jason Barnes <jbarnes@c3po.barnesos.net> Cc: questions@freebsd.org Subject: Re: process will not die. Message-ID: <20041001012932.GH22530@dan.emsphone.com> In-Reply-To: <20040930160527.A58465@c3po.barnesos.net> References: <20040930160527.A58465@c3po.barnesos.net>
next in thread | previous in thread | raw e-mail | index | archive | help
In the last episode (Sep 30), Jason Barnes said: > While running an mpirun job on my dual-processor SMP system > (FreeBSD 4-STABLE from August 28), my program (initiated with the > command line 'mpirun -np 2 ../sphagr') periodically dies, leaving a > process that I can't kill -9. Here's the top: > > here's ps -auxw | grep sph: > > jbarnes 549 0.0 8.7 410076 90744 p2 R 3:39PM 3:01.97 sphagr -p4pg /usr/home/ > jbarnes 550 0.0 0.0 0 0 p2 Z 3:39PM 0:00.00 (sphagr) > > The 550 process I kill -9ed, but its still there, and now when I > try to kill it it says 'no such process'. Processes in the Z state have already exited, but their parent process has not retrieved their status with one of the wait*() functions. The entry in the process table will stay until that happens. You can run "ps axlp 550" and look at the PPID column to determine the parent's pid. The parent code needs to either wait() for the child status, or if it doesn't need to know when the child exits, ignore SIGCHLD or set the SA_NOCLDWAIT flag with sigaction(). -- Dan Nelson dnelson@allantgroup.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20041001012932.GH22530>