From owner-freebsd-stable@FreeBSD.ORG Sun Jan 7 17:00:17 2007 Return-Path: X-Original-To: stable@FreeBSD.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0C32916A407; Sun, 7 Jan 2007 17:00:16 +0000 (UTC) (envelope-from ceri@submonkey.net) Received: from shrike.submonkey.net (cpc2-cdif2-0-0-cust107.cdif.cable.ntl.com [81.104.168.108]) by mx1.freebsd.org (Postfix) with ESMTP id 5EC5613C43E; Sun, 7 Jan 2007 17:00:16 +0000 (UTC) (envelope-from ceri@submonkey.net) Received: from ceri by shrike.submonkey.net with local (Exim 4.64 (FreeBSD)) (envelope-from ) id 1H3bNW-000CGj-5P; Sun, 07 Jan 2007 17:00:14 +0000 Date: Sun, 7 Jan 2007 17:00:14 +0000 From: Ceri Davies To: Robert Watson Message-ID: <20070107170014.GL7088@submonkey.net> References: <20070105111954.GA51511@submonkey.net> <20070105120539.H46119@fledge.watson.org> <20070105131528.GB7088@submonkey.net> <20070105133028.F98541@fledge.watson.org> <20070105150857.GC7088@submonkey.net> <20070106120040.N46119@fledge.watson.org> <20070106132540.GG7088@submonkey.net> <20070107114243.K41371@fledge.watson.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="W/+CTqSGWdiRg+8j" Content-Disposition: inline In-Reply-To: <20070107114243.K41371@fledge.watson.org> X-PGP: finger ceri@FreeBSD.org User-Agent: Mutt/1.5.13 (2006-08-11) Sender: Ceri Davies Cc: stable@FreeBSD.org Subject: Re: (audit?) Panic in 6.2-PRERELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Jan 2007 17:00:17 -0000 --W/+CTqSGWdiRg+8j Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Jan 07, 2007 at 11:49:56AM +0000, Robert Watson wrote: > On Sat, 6 Jan 2007, Ceri Davies wrote: >=20 > >>>So far it's happened this morning and yesterday morning. I haven't se= en=20 > >>>it before that. I don't know the cause so I can't reproduce it at wil= l,=20 > >>>but the logs don't give any indication. Chances are that it will happ= en=20 > >>>again tomorrow, but we'll see. > >> > >>Hmm. It looks like you printf *(td->td_proc->p_fd->fd_ofiles) without= =20 > >>the array index. Could you repeat that, but with the array index --=20 > >>i.e., td->td_proc->p_fd->fd_ofiles[uap->fd]? Also, it would probably b= e=20 > >>useful to print uap->fd. Right now you're printing stdin (index 0), bu= t=20 > >>if the index is non-0, we want a different file. > > > >Very tactfully put :) Sorry about that. > > > >None of the uap->fd's seem to be valid. In the first case, uap->fd is wa= y=20 > >too high for the length of fd_ofiles, which only has 21 elements: > > > >(kgdb) up 8 > >#8 0xc04c470d in fstat (td=3D0xc2eeb180, uap=3D0xd610dc74) at=20 > >/usr/src/sys/kern/kern_descrip.c:1075 > >1075 error =3D kern_fstat(td, uap->fd, &ub); > >(kgdb) p uap->fd > >$1 =3D 89 > >(kgdb) p *td->td_proc->p_fd->fd_ofiles[uap->fd] > >Cannot access memory at address 0x0 > > > >In the second, uap->fd is nonsense: > > > >(kgdb) up 8 > >#8 0xc04c470d in fstat (td=3D0xc3109300, uap=3D0xd617ec74) at=20 > >/usr/src/sys/kern/kern_descrip.c:1075 > >1075 error =3D kern_fstat(td, uap->fd, &ub); > >(kgdb) p uap->fd > >$1 =3D -1023449232 > >(kgdb) >=20 > Hmm. So, I reviewed audit_arg_file() closely, and after staring at the= =20 > code a lot, couldn't see anything obvious in either the socket or the=20 > vnode/fifo case. I did fix one other bug there, however, which can never= =20 > actually be exercised in 7-CURRENT, and is fairly unlikely in 6-STABLE, a= nd=20 > will MFC that in a week. OK, thanks. > Could you try printing *td->td_ar? Maybe this will give us a clue as to= =20 > how far it got. In particular, this may be able to more reliably give us= =20 > the file descriptor number, which is audited early in the system call. Y= ou=20 > might find that 'td' is corrupted in many layers of the stack, keep going= =20 > up until you find one where it's good. It may well be that=20 > td->td_ar->k_ar.ar_arg_fd is correct, and might confirm that uap->fd is= =20 > correct still. We'd like also to know if ARG_SOCKINFO, ARG_VNODE1, or=20 > ARG_VNODE2 is set in the k_ar.ar_valid_arg field. This may tell us some= =20 > more about the file descriptor even though it appears to have vanished. *td->td_ar is null (0x0) in both cases... > I'm quite worried by the fact that the file descriptor seems not to be=20 > present any more -- this suggests a file descriptor related race of the= =20 > sort that is both quite difficult to figure out and also quite a risk. = =20 > It's strange that it would only trigger with audit, however--perhaps audi= t=20 > stretches out the race. Is this an SMP box? It's certainly looking quite nasty. This system is UP hardware without options SMP. > Could you print the entire contents of *td->td_proc->p_fd? First case: (kgdb) p *td->td_proc->p_fd $2 =3D {fd_ofiles =3D 0xc3441000, fd_ofileflags =3D 0xc3441100 "", fd_cdir = =3D 0xc367f110,=20 fd_rdir =3D 0xc2ce2bb0, fd_jdir =3D 0x0, fd_nfiles =3D 64, fd_map =3D 0xc= 3b65970, fd_lastfile =3D 20,=20 fd_freefile =3D 16, fd_cmask =3D 63, fd_refcnt =3D 1, fd_holdcnt =3D 1, f= d_mtx =3D {mtx_object =3D { lo_class =3D 0xc06ad4c4, lo_name =3D 0xc067c0fd "filedesc structure",= =20 lo_type =3D 0xc067c0fd "filedesc structure", lo_flags =3D 196608, lo_= list =3D {tqe_next =3D 0x0,=20 tqe_prev =3D 0x0}, lo_witness =3D 0x0}, mtx_lock =3D 4, mtx_recurse= =3D 0}, fd_locked =3D 0,=20 fd_wanted =3D 0, fd_kqlist =3D {slh_first =3D 0x0}, fd_holdleaderscount = =3D 0, fd_holdleaderswakeup =3D 0} Second case: (kgdb) p *td->td_proc->p_fd $2 =3D {fd_ofiles =3D 0xc2d23600, fd_ofileflags =3D 0xc2d23700 "", fd_cdir = =3D 0xc31b8660,=20 fd_rdir =3D 0xc2ce2bb0, fd_jdir =3D 0x0, fd_nfiles =3D 64, fd_map =3D 0xc= 2e9c1c0, fd_lastfile =3D 20,=20 fd_freefile =3D 17, fd_cmask =3D 63, fd_refcnt =3D 1, fd_holdcnt =3D 1, f= d_mtx =3D {mtx_object =3D { lo_class =3D 0xc06ad4c4, lo_name =3D 0xc067c0fd "filedesc structure",= =20 lo_type =3D 0xc067c0fd "filedesc structure", lo_flags =3D 196608, lo_= list =3D {tqe_next =3D 0x0,=20 tqe_prev =3D 0x0}, lo_witness =3D 0x0}, mtx_lock =3D 4, mtx_recurse= =3D 0}, fd_locked =3D 0,=20 fd_wanted =3D 0, fd_kqlist =3D {slh_first =3D 0x0}, fd_holdleaderscount = =3D 0, fd_holdleaderswakeup =3D 0} If it's at all useful, I can provide access to this system and the dumps. Ceri --=20 That must be wonderful! I don't understand it at all. -- Moliere --W/+CTqSGWdiRg+8j Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (FreeBSD) iD8DBQFFoScdocfcwTS3JF8RAhYrAKCxO85FwJV3rYWlBndqmdaqWcBT/ACeKmN0 5F7RP2mhd2mz+rld2PYekUs= =EERF -----END PGP SIGNATURE----- --W/+CTqSGWdiRg+8j--