From owner-freebsd-current@FreeBSD.ORG  Sun Jun 13 06:47:15 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id E343716A4CE; Sun, 13 Jun 2004 06:47:15 +0000 (GMT)
Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 7AB0A43D46; Sun, 13 Jun 2004 06:47:15 +0000 (GMT)
	(envelope-from truckman@FreeBSD.org)
Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2])
	by gw.catspoiler.org (8.12.11/8.12.11) with ESMTP id i5D6jF7q026079;
	Sat, 12 Jun 2004 23:45:19 -0700 (PDT)
	(envelope-from truckman@FreeBSD.org)
Message-Id: <200406130645.i5D6jF7q026079@gw.catspoiler.org>
Date: Sat, 12 Jun 2004 23:45:14 -0700 (PDT)
From: Don Lewis <truckman@FreeBSD.org>
To: rwatson@FreeBSD.org
In-Reply-To: <Pine.NEB.3.96L.1040613004127.1617A-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
cc: current@FreeBSD.org
cc: tjr@FreeBSD.org
Subject: Re: Fatal trap 12 in kern/kern_descrip.c:2346
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 13 Jun 2004 06:47:16 -0000

On 13 Jun, Robert Watson wrote:
> 
> On Sun, 13 Jun 2004, Tim Robbins wrote:
> 
>> > Well, this is certainly a NULL pointer dereference in the sysctl code
>> > exporting file descriptor information to user space (perhaps for fstat?). 
>> > The question is what is NULL.  It looks like you have a dump -- could you
>> > convert sysctl_kern_file+0x105 to a line number?  It's likely that it is
>> > line 2346 of kern_descrip.c, which follows the process pointer to its
>> > ucred.  If so, could you use gdb on the dump to inspect *p?
>> 
>> ISTR he included the output of "print *p" on his web page.
>> 
>> I think the problem here is that we put processes onto the allproc list
>> in fork1() before they're properly initialised (or we unlock the allproc
>> sx too early.) 
> 
> Hmm.  I noticed, though, that p_flag is set to P_CONTROLT and P_WEXIT, so
> my initial suspicion was actually exit1().

My initial suspicion was the kern_wait() code that sets p_ucred to NULL,
but the process has been removed from allproc by that point.

It also looks to me like fork1() is the culprit.  The new process is put
on allproc at line 410, allproc_lock is dropped at line 412, the process
is locked at line 474, p_flag is cleared at line 509, and p_ucred is set
at line 521.  Another clue is the p_state is PRS_NEW.  Based on this,
I'd guess that sysctl_kern_file() is stumbling across this process while
fork1() is somewhere between lines 412 and 474.

I think the bzero()/bcopy() stuff has to happen before the new process
is added to allproc and p_ucred is set, otherwise there is the
possibility of an information leak between jails (p_comm[], etc.).

Why is sched_fork() called so early?