Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 7 Jan 2007 17:00:14 +0000
From:      Ceri Davies <ceri@submonkey.net>
To:        Robert Watson <rwatson@FreeBSD.org>
Cc:        stable@FreeBSD.org
Subject:   Re: (audit?) Panic in 6.2-PRERELEASE
Message-ID:  <20070107170014.GL7088@submonkey.net>
In-Reply-To: <20070107114243.K41371@fledge.watson.org>
References:  <20070105111954.GA51511@submonkey.net> <20070105120539.H46119@fledge.watson.org> <20070105131528.GB7088@submonkey.net> <20070105133028.F98541@fledge.watson.org> <20070105150857.GC7088@submonkey.net> <20070106120040.N46119@fledge.watson.org> <20070106132540.GG7088@submonkey.net> <20070107114243.K41371@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--W/+CTqSGWdiRg+8j
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Jan 07, 2007 at 11:49:56AM +0000, Robert Watson wrote:
> On Sat, 6 Jan 2007, Ceri Davies wrote:
>=20
> >>>So far it's happened this morning and yesterday morning.  I haven't se=
en=20
> >>>it before that.  I don't know the cause so I can't reproduce it at wil=
l,=20
> >>>but the logs don't give any indication.  Chances are that it will happ=
en=20
> >>>again tomorrow, but we'll see.
> >>
> >>Hmm.  It looks like you printf *(td->td_proc->p_fd->fd_ofiles) without=
=20
> >>the array index.  Could you repeat that, but with the array index --=20
> >>i.e., td->td_proc->p_fd->fd_ofiles[uap->fd]?  Also, it would probably b=
e=20
> >>useful to print uap->fd.  Right now you're printing stdin (index 0), bu=
t=20
> >>if the index is non-0, we want a different file.
> >
> >Very tactfully put :)  Sorry about that.
> >
> >None of the uap->fd's seem to be valid. In the first case, uap->fd is wa=
y=20
> >too high for the length of fd_ofiles, which only has 21 elements:
> >
> >(kgdb) up 8
> >#8  0xc04c470d in fstat (td=3D0xc2eeb180, uap=3D0xd610dc74) at=20
> >/usr/src/sys/kern/kern_descrip.c:1075
> >1075            error =3D kern_fstat(td, uap->fd, &ub);
> >(kgdb) p uap->fd
> >$1 =3D 89
> >(kgdb) p *td->td_proc->p_fd->fd_ofiles[uap->fd]
> >Cannot access memory at address 0x0
> >
> >In the second, uap->fd is nonsense:
> >
> >(kgdb) up 8
> >#8  0xc04c470d in fstat (td=3D0xc3109300, uap=3D0xd617ec74) at=20
> >/usr/src/sys/kern/kern_descrip.c:1075
> >1075            error =3D kern_fstat(td, uap->fd, &ub);
> >(kgdb) p uap->fd
> >$1 =3D -1023449232
> >(kgdb)
>=20
> Hmm.  So, I reviewed audit_arg_file() closely, and after staring at the=
=20
> code a lot, couldn't see anything obvious in either the socket or the=20
> vnode/fifo case.  I did fix one other bug there, however, which can never=
=20
> actually be exercised in 7-CURRENT, and is fairly unlikely in 6-STABLE, a=
nd=20
> will MFC that in a week.

OK, thanks.

> Could you try printing *td->td_ar?  Maybe this will give us a clue as to=
=20
> how far it got.  In particular, this may be able to more reliably give us=
=20
> the file descriptor number, which is audited early in the system call.  Y=
ou=20
> might find that 'td' is corrupted in many layers of the stack, keep going=
=20
> up until you find one where it's good.  It may well be that=20
> td->td_ar->k_ar.ar_arg_fd is correct, and might confirm that uap->fd is=
=20
> correct still.  We'd like also to know if ARG_SOCKINFO, ARG_VNODE1, or=20
> ARG_VNODE2 is set in the k_ar.ar_valid_arg field.  This may tell us some=
=20
> more about the file descriptor even though it appears to have vanished.

*td->td_ar is null (0x0) in both cases...

> I'm quite worried by the fact that the file descriptor seems not to be=20
> present any more -- this suggests a file descriptor related race of the=
=20
> sort that is both quite difficult to figure out and also quite a risk. =
=20
> It's strange that it would only trigger with audit, however--perhaps audi=
t=20
> stretches out the race.  Is this an SMP box?

It's certainly looking quite nasty.  This system is UP hardware without
options SMP.

> Could you print the entire contents of *td->td_proc->p_fd?

First case:

(kgdb) p *td->td_proc->p_fd
$2 =3D {fd_ofiles =3D 0xc3441000, fd_ofileflags =3D 0xc3441100 "", fd_cdir =
=3D 0xc367f110,=20
  fd_rdir =3D 0xc2ce2bb0, fd_jdir =3D 0x0, fd_nfiles =3D 64, fd_map =3D 0xc=
3b65970, fd_lastfile =3D 20,=20
  fd_freefile =3D 16, fd_cmask =3D 63, fd_refcnt =3D 1, fd_holdcnt =3D 1, f=
d_mtx =3D {mtx_object =3D {
      lo_class =3D 0xc06ad4c4, lo_name =3D 0xc067c0fd "filedesc structure",=
=20
      lo_type =3D 0xc067c0fd "filedesc structure", lo_flags =3D 196608, lo_=
list =3D {tqe_next =3D 0x0,=20
        tqe_prev =3D 0x0}, lo_witness =3D 0x0}, mtx_lock =3D 4, mtx_recurse=
 =3D 0}, fd_locked =3D 0,=20
  fd_wanted =3D 0, fd_kqlist =3D {slh_first =3D 0x0}, fd_holdleaderscount =
=3D 0, fd_holdleaderswakeup =3D 0}

Second case:

(kgdb) p *td->td_proc->p_fd
$2 =3D {fd_ofiles =3D 0xc2d23600, fd_ofileflags =3D 0xc2d23700 "", fd_cdir =
=3D 0xc31b8660,=20
  fd_rdir =3D 0xc2ce2bb0, fd_jdir =3D 0x0, fd_nfiles =3D 64, fd_map =3D 0xc=
2e9c1c0, fd_lastfile =3D 20,=20
  fd_freefile =3D 17, fd_cmask =3D 63, fd_refcnt =3D 1, fd_holdcnt =3D 1, f=
d_mtx =3D {mtx_object =3D {
      lo_class =3D 0xc06ad4c4, lo_name =3D 0xc067c0fd "filedesc structure",=
=20
      lo_type =3D 0xc067c0fd "filedesc structure", lo_flags =3D 196608, lo_=
list =3D {tqe_next =3D 0x0,=20
        tqe_prev =3D 0x0}, lo_witness =3D 0x0}, mtx_lock =3D 4, mtx_recurse=
 =3D 0}, fd_locked =3D 0,=20
  fd_wanted =3D 0, fd_kqlist =3D {slh_first =3D 0x0}, fd_holdleaderscount =
=3D 0, fd_holdleaderswakeup =3D 0}

If it's at all useful, I can provide access to this system and the
dumps.

Ceri
--=20
That must be wonderful!  I don't understand it at all.
                                                  -- Moliere

--W/+CTqSGWdiRg+8j
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (FreeBSD)

iD8DBQFFoScdocfcwTS3JF8RAhYrAKCxO85FwJV3rYWlBndqmdaqWcBT/ACeKmN0
5F7RP2mhd2mz+rld2PYekUs=
=EERF
-----END PGP SIGNATURE-----

--W/+CTqSGWdiRg+8j--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070107170014.GL7088>