From owner-freebsd-current  Mon Mar 10 14:51:36 2003
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 2E27D37B405
	for <current@FreeBSD.org>; Mon, 10 Mar 2003 14:51:34 -0800 (PST)
Received: from mail.gmx.net (imap.gmx.net [213.165.64.20])
	by mx1.FreeBSD.org (Postfix) with SMTP id EF56843FB1
	for <current@FreeBSD.org>; Mon, 10 Mar 2003 14:51:30 -0800 (PST)
	(envelope-from tmoestl@gmx.net)
Received: (qmail 28445 invoked by uid 0); 10 Mar 2003 22:51:29 -0000
Received: from p508E4D4F.dip.t-dialin.net (HELO galatea.local) (80.142.77.79)
  by mail.gmx.net (mp023-rz3) with SMTP; 10 Mar 2003 22:51:29 -0000
Received: from tmm by galatea.local with local (Exim 4.12 #1)
	id 18sW7c-0000s1-00; Mon, 10 Mar 2003 23:51:52 +0100
Date: Mon, 10 Mar 2003 23:51:52 +0100
From: Thomas Moestl <tmoestl@gmx.net>
To: Tim Robbins <tjr@FreeBSD.org>
Cc: John Baldwin <jhb@FreeBSD.org>,
	Kris Kennaway <kris@obsecurity.org>, alfred@FreeBSD.org,
	current@FreeBSD.org, Poul-Henning Kamp <phk@phk.freebsd.dk>
Subject: Re: NULL pointer problem in pid selection ?
Message-ID: <20030310225151.GA2803@crow.dom2ip.de>
Mail-Followup-To: Tim Robbins <tjr@FreeBSD.org>,
	John Baldwin <jhb@FreeBSD.org>, Kris Kennaway <kris@obsecurity.org>,
	alfred@FreeBSD.org, current@FreeBSD.org,
	Poul-Henning Kamp <phk@phk.freebsd.dk>
References: <20030308213535.GE56020@rot13.obsecurity.org> <XFMail.20030310130015.jhb@FreeBSD.org> <20030311084346.A63542@dilbert.robbins.dropbear.id.au>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20030311084346.A63542@dilbert.robbins.dropbear.id.au>
User-Agent: Mutt/1.4i
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG

On Tue, 2003/03/11 at 08:43:46 +1100, Tim Robbins wrote:
> On Mon, Mar 10, 2003 at 01:00:15PM -0500, John Baldwin wrote:
> 
> > On 08-Mar-2003 Kris Kennaway wrote:
> > > On Sat, Mar 08, 2003 at 11:46:34AM +0100, Poul-Henning Kamp wrote:
> > >> 
> > >> Just got this crash on -current, and I belive I have seen similar
> > >> before.  addr2line(1) reports the faulting address to be
> > >>      ../../../kern/kern_fork.c:395
> > >> which is in the inner loop of pid collision avoidance.
> > > 
> > > I've been running this patch from Alfred for the past month or so on
> > > bento, which has fixed a similar panic I was seeing regularly.
> > 
> > Using just a shared lock instead of an xlock should be ok there.  You
> > aren't modifying the process tree, just looking at it.  OTOH, the
> > proc lock is supposed to protect p_grp and p_session, so they shouldn't
> > be NULL. :(
> 
> I have a suspiscion that the bug is actually in wait1():
> 
>         sx_xlock(&proctree_lock);
> 	[...]
> 	/*
> 	 * Remove other references to this process to ensure
> 	 * we have an exclusive reference.
> 	 */
> 	leavepgrp(p);
> 
> 	sx_xlock(&allproc_lock);
> 	LIST_REMOVE(p, p_list); /* off zombproc */
> 	sx_xunlock(&allproc_lock);
> 
> 	LIST_REMOVE(p, p_sibling);
> 	sx_xunlock(&proctree_lock);
> 
> 
> Shouldn't we be removing the process from zombproc before setting
> p_pgrp to NULL via leavepgrp()? Does this even matter at all when both
> fork1() and wait1() are still protected by Giant?

Hmmm, I think you're right; if allproc_lock happens to be contested in
fork1() (which can happen because it it is locked without Giant held
in some places, and because sleeping with an sx lock is allowed),
we'll go to sleep there, dropping Giant. This opens up a race, since
wait1() can now proceed until after the leavepgrp() before blocking;
when allproc_lock is released, fork1() will be the first to pick it
up, and this panic will happen.
Seems that I relied on Giant too much when I first took a look into
that code :)

	- Thomas

-- 
Thomas Moestl <tmoestl@gmx.net>	http://www.tu-bs.de/~y0015675/
              <tmm@FreeBSD.org>	http://people.FreeBSD.org/~tmm/
PGP fingerprint: 1C97 A604 2BD0 E492 51D0  9C0F 1FE6 4F1D 419C 776C

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message