From owner-freebsd-stable@FreeBSD.ORG Sun Jan 7 11:49:57 2007 Return-Path: X-Original-To: stable@FreeBSD.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6A1F516A412 for ; Sun, 7 Jan 2007 11:49:57 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 3F6E013C441 for ; Sun, 7 Jan 2007 11:49:57 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id A3B6F4C2AC; Sun, 7 Jan 2007 06:49:56 -0500 (EST) Date: Sun, 7 Jan 2007 11:49:56 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Ceri Davies In-Reply-To: <20070106132540.GG7088@submonkey.net> Message-ID: <20070107114243.K41371@fledge.watson.org> References: <20070105111954.GA51511@submonkey.net> <20070105120539.H46119@fledge.watson.org> <20070105131528.GB7088@submonkey.net> <20070105133028.F98541@fledge.watson.org> <20070105150857.GC7088@submonkey.net> <20070106120040.N46119@fledge.watson.org> <20070106132540.GG7088@submonkey.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: stable@FreeBSD.org Subject: Re: (audit?) Panic in 6.2-PRERELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Jan 2007 11:49:57 -0000 On Sat, 6 Jan 2007, Ceri Davies wrote: >>> So far it's happened this morning and yesterday morning. I haven't seen >>> it before that. I don't know the cause so I can't reproduce it at will, >>> but the logs don't give any indication. Chances are that it will happen >>> again tomorrow, but we'll see. >> >> Hmm. It looks like you printf *(td->td_proc->p_fd->fd_ofiles) without the >> array index. Could you repeat that, but with the array index -- i.e., >> td->td_proc->p_fd->fd_ofiles[uap->fd]? Also, it would probably be useful >> to print uap->fd. Right now you're printing stdin (index 0), but if the >> index is non-0, we want a different file. > > Very tactfully put :) Sorry about that. > > None of the uap->fd's seem to be valid. In the first case, uap->fd is way > too high for the length of fd_ofiles, which only has 21 elements: > > (kgdb) up 8 > #8 0xc04c470d in fstat (td=0xc2eeb180, uap=0xd610dc74) at /usr/src/sys/kern/kern_descrip.c:1075 > 1075 error = kern_fstat(td, uap->fd, &ub); > (kgdb) p uap->fd > $1 = 89 > (kgdb) p *td->td_proc->p_fd->fd_ofiles[uap->fd] > Cannot access memory at address 0x0 > > In the second, uap->fd is nonsense: > > (kgdb) up 8 > #8 0xc04c470d in fstat (td=0xc3109300, uap=0xd617ec74) at /usr/src/sys/kern/kern_descrip.c:1075 > 1075 error = kern_fstat(td, uap->fd, &ub); > (kgdb) p uap->fd > $1 = -1023449232 > (kgdb) Hmm. So, I reviewed audit_arg_file() closely, and after staring at the code a lot, couldn't see anything obvious in either the socket or the vnode/fifo case. I did fix one other bug there, however, which can never actually be exercised in 7-CURRENT, and is fairly unlikely in 6-STABLE, and will MFC that in a week. Could you try printing *td->td_ar? Maybe this will give us a clue as to how far it got. In particular, this may be able to more reliably give us the file descriptor number, which is audited early in the system call. You might find that 'td' is corrupted in many layers of the stack, keep going up until you find one where it's good. It may well be that td->td_ar->k_ar.ar_arg_fd is correct, and might confirm that uap->fd is correct still. We'd like also to know if ARG_SOCKINFO, ARG_VNODE1, or ARG_VNODE2 is set in the k_ar.ar_valid_arg field. This may tell us some more about the file descriptor even though it appears to have vanished. I'm quite worried by the fact that the file descriptor seems not to be present any more -- this suggests a file descriptor related race of the sort that is both quite difficult to figure out and also quite a risk. It's strange that it would only trigger with audit, however--perhaps audit stretches out the race. Is this an SMP box? Could you print the entire contents of *td->td_proc->p_fd? Thanks, Robert N M Watson Computer Laboratory University of Cambridge