From owner-freebsd-current Mon Mar 10 14:46:33 2003 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8193537B407 for ; Mon, 10 Mar 2003 14:46:31 -0800 (PST) Received: from mail.speakeasy.net (mail17.speakeasy.net [216.254.0.217]) by mx1.FreeBSD.org (Postfix) with ESMTP id E614143F93 for ; Mon, 10 Mar 2003 14:46:29 -0800 (PST) (envelope-from jhb@FreeBSD.org) Received: (qmail 20247 invoked from network); 10 Mar 2003 22:46:38 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) by mail17.speakeasy.net (qmail-ldap-1.03) with DES-CBC3-SHA encrypted SMTP for ; 10 Mar 2003 22:46:38 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.6/8.12.6) with ESMTP id h2AMgmhT066848; Mon, 10 Mar 2003 17:42:50 -0500 (EST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.2 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <20030311084346.A63542@dilbert.robbins.dropbear.id.au> Date: Mon, 10 Mar 2003 17:46:40 -0500 (EST) From: John Baldwin To: Tim Robbins Subject: Re: NULL pointer problem in pid selection ? Cc: Poul-Henning Kamp , current@FreeBSD.org, alfred@FreeBSD.org, Kris Kennaway Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 10-Mar-2003 Tim Robbins wrote: > On Mon, Mar 10, 2003 at 01:00:15PM -0500, John Baldwin wrote: > >> On 08-Mar-2003 Kris Kennaway wrote: >> > On Sat, Mar 08, 2003 at 11:46:34AM +0100, Poul-Henning Kamp wrote: >> >> >> >> Just got this crash on -current, and I belive I have seen similar >> >> before. addr2line(1) reports the faulting address to be >> >> ../../../kern/kern_fork.c:395 >> >> which is in the inner loop of pid collision avoidance. >> > >> > I've been running this patch from Alfred for the past month or so on >> > bento, which has fixed a similar panic I was seeing regularly. >> >> Using just a shared lock instead of an xlock should be ok there. You >> aren't modifying the process tree, just looking at it. OTOH, the >> proc lock is supposed to protect p_grp and p_session, so they shouldn't >> be NULL. :( > > I have a suspiscion that the bug is actually in wait1(): > > sx_xlock(&proctree_lock); > [...] > /* > * Remove other references to this process to ensure > * we have an exclusive reference. > */ > leavepgrp(p); > > sx_xlock(&allproc_lock); > LIST_REMOVE(p, p_list); /* off zombproc */ > sx_xunlock(&allproc_lock); > > LIST_REMOVE(p, p_sibling); > sx_xunlock(&proctree_lock); > > > Shouldn't we be removing the process from zombproc before setting > p_pgrp to NULL via leavepgrp()? Does this even matter at all when both > fork1() and wait1() are still protected by Giant? Giant doesn't help you with sleeps. However, removing the process from zombproc before destroying it's other linkages might be more correct, yes. > Tim -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message