From owner-freebsd-current@FreeBSD.ORG Sun Jun 13 06:47:15 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E343716A4CE; Sun, 13 Jun 2004 06:47:15 +0000 (GMT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7AB0A43D46; Sun, 13 Jun 2004 06:47:15 +0000 (GMT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.11/8.12.11) with ESMTP id i5D6jF7q026079; Sat, 12 Jun 2004 23:45:19 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200406130645.i5D6jF7q026079@gw.catspoiler.org> Date: Sat, 12 Jun 2004 23:45:14 -0700 (PDT) From: Don Lewis To: rwatson@FreeBSD.org In-Reply-To: MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: current@FreeBSD.org cc: tjr@FreeBSD.org Subject: Re: Fatal trap 12 in kern/kern_descrip.c:2346 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Jun 2004 06:47:16 -0000 On 13 Jun, Robert Watson wrote: > > On Sun, 13 Jun 2004, Tim Robbins wrote: > >> > Well, this is certainly a NULL pointer dereference in the sysctl code >> > exporting file descriptor information to user space (perhaps for fstat?). >> > The question is what is NULL. It looks like you have a dump -- could you >> > convert sysctl_kern_file+0x105 to a line number? It's likely that it is >> > line 2346 of kern_descrip.c, which follows the process pointer to its >> > ucred. If so, could you use gdb on the dump to inspect *p? >> >> ISTR he included the output of "print *p" on his web page. >> >> I think the problem here is that we put processes onto the allproc list >> in fork1() before they're properly initialised (or we unlock the allproc >> sx too early.) > > Hmm. I noticed, though, that p_flag is set to P_CONTROLT and P_WEXIT, so > my initial suspicion was actually exit1(). My initial suspicion was the kern_wait() code that sets p_ucred to NULL, but the process has been removed from allproc by that point. It also looks to me like fork1() is the culprit. The new process is put on allproc at line 410, allproc_lock is dropped at line 412, the process is locked at line 474, p_flag is cleared at line 509, and p_ucred is set at line 521. Another clue is the p_state is PRS_NEW. Based on this, I'd guess that sysctl_kern_file() is stumbling across this process while fork1() is somewhere between lines 412 and 474. I think the bzero()/bcopy() stuff has to happen before the new process is added to allproc and p_ucred is set, otherwise there is the possibility of an information leak between jails (p_comm[], etc.). Why is sched_fork() called so early?