From owner-freebsd-questions@FreeBSD.ORG Thu Sep 30 23:13:46 2004 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 58F9F16A4CE for ; Thu, 30 Sep 2004 23:13:46 +0000 (GMT) Received: from c3po.barnesos.net (c3po.LPL.arizona.edu [128.196.64.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2C29343D46 for ; Thu, 30 Sep 2004 23:13:46 +0000 (GMT) (envelope-from jbarnes@c3po.barnesos.net) Received: by c3po.barnesos.net (Postfix, from userid 1001) id AB168B898; Thu, 30 Sep 2004 16:13:45 -0700 (MST) Received: from localhost (localhost [127.0.0.1]) by c3po.barnesos.net (Postfix) with ESMTP id 91DB2B86A for ; Thu, 30 Sep 2004 16:13:45 -0700 (MST) Date: Thu, 30 Sep 2004 16:13:45 -0700 (MST) From: Jason Barnes To: questions@freebsd.org Message-ID: <20040930160527.A58465@c3po.barnesos.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: process will not die. X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Sep 2004 23:13:46 -0000 While running an mpirun job on my dual-processor SMP system (FreeBSD 4-STABLE from August 28), my program (initiated with the command line 'mpirun -np 2 ../sphagr') periodically dies, leaving a process that I can't kill -9. Here's the top: 216 root 2 0 166M 113M select 1 27:44 3.22% 3.22% XFree86 327 jbarnes 2 0 72364K 58056K poll 1 6:53 0.00% 0.00% kdeinit 549 jbarnes 28 0 400M 90744K CPU0 0 3:02 0.00% 0.00% sphagr 267 jbarnes 2 0 23388K 10932K poll 1 0:42 0.00% 0.00% kdeinit here's ps -auxw | grep sph: jbarnes 549 0.0 8.7 410076 90744 p2 R 3:39PM 3:01.97 sphagr -p4pg /usr/home/ jbarnes 550 0.0 0.0 0 0 p2 Z 3:39PM 0:00.00 (sphagr) The 550 process I kill -9ed, but its still there, and now when I try to kill it it says 'no such process'. Has anyone else had any experience with mpi processes being unkillable? Supposedly 5.3 has better SMP support -- might it solve this problem? Thanks for your ideas, - Jason Barnes