From owner-freebsd-stable@FreeBSD.ORG Sun Jan 7 18:05:40 2007 Return-Path: X-Original-To: stable@FreeBSD.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 23D4416A403 for ; Sun, 7 Jan 2007 18:05:40 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id DBE8213C461 for ; Sun, 7 Jan 2007 18:05:39 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 2D83848D52; Sun, 7 Jan 2007 13:05:39 -0500 (EST) Date: Sun, 7 Jan 2007 18:05:39 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Ceri Davies In-Reply-To: <20070107170014.GL7088@submonkey.net> Message-ID: <20070107180257.I41371@fledge.watson.org> References: <20070105111954.GA51511@submonkey.net> <20070105120539.H46119@fledge.watson.org> <20070105131528.GB7088@submonkey.net> <20070105133028.F98541@fledge.watson.org> <20070105150857.GC7088@submonkey.net> <20070106120040.N46119@fledge.watson.org> <20070106132540.GG7088@submonkey.net> <20070107114243.K41371@fledge.watson.org> <20070107170014.GL7088@submonkey.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: stable@FreeBSD.org Subject: Re: (audit?) Panic in 6.2-PRERELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Jan 2007 18:05:40 -0000 On Sun, 7 Jan 2007, Ceri Davies wrote: >> Could you try printing *td->td_ar? Maybe this will give us a clue as to >> how far it got. In particular, this may be able to more reliably give us >> the file descriptor number, which is audited early in the system call. >> You might find that 'td' is corrupted in many layers of the stack, keep >> going up until you find one where it's good. It may well be that >> td->td_ar->k_ar.ar_arg_fd is correct, and might confirm that uap->fd is >> correct still. We'd like also to know if ARG_SOCKINFO, ARG_VNODE1, or >> ARG_VNODE2 is set in the k_ar.ar_valid_arg field. This may tell us some >> more about the file descriptor even though it appears to have vanished. > > *td->td_ar is null (0x0) in both cases... I'm actually beginning to wonder if this is actually audit-related at all. Something is clearly not right, and the audit code should not actually have been entered at all there. Perhaps we're being mislead by the stack trace corruption into thinking audit is involved. >> I'm quite worried by the fact that the file descriptor seems not to be >> present any more -- this suggests a file descriptor related race of the >> sort that is both quite difficult to figure out and also quite a risk. It's >> strange that it would only trigger with audit, however--perhaps audit >> stretches out the race. Is this an SMP box? > > It's certainly looking quite nasty. This system is UP hardware without > options SMP. > > ... > > If it's at all useful, I can provide access to this system and the dumps. Yeah, I think at this point that would probably be the most helpful thing. Could you confirm that the kernel.debug you're using definitely matches the version of the kernel in the core dump? Robert N M Watson Computer Laboratory University of Cambridge