Date: Sat, 13 Jan 2007 15:11:06 -0500 From: Kris Kennaway <kris@obsecurity.org> To: Sven Willenberger <sven@dmv.com> Cc: Kostik Belousov <kostikbel@gmail.com>, stable@freebsd.org, Kris Kennaway <kris@obsecurity.org> Subject: Re: Not panic in nfsd (Re: panic in nfsd on 6.2-RC1) Message-ID: <20070113201106.GD66260@xor.obsecurity.org> In-Reply-To: <4596F06D.30004@dmv.com> References: <20061205.004323.78708386.hrs@allbsd.org> <20061204160949.GM35681@deviant.kiev.zoral.com.ua> <20061205.123805.59655403.hrs@allbsd.org> <1166194879.6317.11.camel@lanshark.dmv.com> <20061215181548.GA58555@xor.obsecurity.org> <1166209936.6317.21.camel@lanshark.dmv.com> <20061215192958.GA86926@xor.obsecurity.org> <20061215212040.GG23698@deviant.kiev.zoral.com.ua> <1166463200.11562.5.camel@lanshark.dmv.com> <4596F06D.30004@dmv.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--lc9FT7cWel8HagAv Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Dec 30, 2006 at 06:04:13PM -0500, Sven Willenberger wrote: >=20 >=20 > Sven Willenberger presumably uttered the following on 12/18/06 12:33: > > On Fri, 2006-12-15 at 23:20 +0200, Kostik Belousov wrote: > >> On Fri, Dec 15, 2006 at 02:29:58PM -0500, Kris Kennaway wrote: > >=20 > > <<SNIP>> > >=20 > >>> =20 > >>>> FWIW, I do see the following appearing in the /var/log/messages: > >>>> ufs_rename: fvp =3D=3D tvp (can't happen)=20 > >>>> about once or twice a day, but cannot correlate those to lockup. Now > >>>> that I have enabled the options mentioned above in the kernel, I am > >>>> seeing some LOR issues: > >>>> > >>>> kernel: lock order reversal: > >>>> kernel: 1st 0xffffff00c3bab200 kqueue (kqueue) @ /usr/src/sys/kern/k= ern_event.c:1547 > >>>> kernel: 2nd 0xffffff0005bb6078 struct mount mtx (struct mount mtx) @= /usr/src/sys/ufs/ufs/ufs_vnops.c:138 > >>> OK, this is interesting, so let's proceed from here. > >>> > >>> Kris > >> Try this. > >> > >> Index: ufs/ufs/ufs_vnops.c > >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v > >> retrieving revision 1.283 > >> diff -u -r1.283 ufs_vnops.c > >> --- ufs/ufs/ufs_vnops.c 6 Nov 2006 13:42:09 -0000 1.283 > >> +++ ufs/ufs/ufs_vnops.c 15 Dec 2006 21:19:51 -0000 > >> @@ -133,19 +133,15 @@ > >> { > >> struct inode *ip; > >> struct timespec ts; > >> - int mnt_locked; > >> =20 > >> ip =3D VTOI(vp); > >> - mnt_locked =3D 0; > >> - if ((vp->v_mount->mnt_flag & MNT_RDONLY) !=3D 0) { > >> - VI_LOCK(vp); > >> + VI_LOCK(vp); > >> + if ((vp->v_mount->mnt_flag & MNT_RDONLY) !=3D 0) > >> goto out; > >> + if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) =3D=3D 0) { > >> + VI_UNLOCK(vp); > >> + return; > >> } > >> - MNT_ILOCK(vp->v_mount); /* For reading of mnt_kern_flags. */ > >> - mnt_locked =3D 1; > >> - VI_LOCK(vp); > >> - if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) =3D=3D 0) > >> - goto out_unl; > >> =20 > >> if ((vp->v_type =3D=3D VBLK || vp->v_type =3D=3D VCHR) && !DOINGSOFT= DEP(vp)) > >> ip->i_flag |=3D IN_LAZYMOD; > >> @@ -172,10 +168,7 @@ > >> =20 > >> out: > >> ip->i_flag &=3D ~(IN_ACCESS | IN_CHANGE | IN_UPDATE); > >> - out_unl: > >> VI_UNLOCK(vp); > >> - if (mnt_locked) > >> - MNT_IUNLOCK(vp->v_mount); > >> } > >> =20 > >> /* > >=20 > >=20 > > Patch applied cleanly (offset 6 lines), make buildworld, make kernel, > > reboot, make installworld, etc. > >=20 > > kernel: lock order reversal: > > kernel: 1st 0xffffff00b9181800 kqueue (kqueue) @ /usr/src/sys/kern/kern= _event.c:1547 > > kernel: 2nd 0xffffff00c16030d0 vnode interlock (vnode interlock) @ /usr= /src/sys/ufs/ufs/ufs_vnops.c:132 > >=20 > >=20 > >=20 > > _______________________________________________ >=20 > Having enabled witness and ddb, etc I cannot get this LOR to trigger anym= ore, but > the machine is still locking up. I finally managed to get a piece of what= was > appearing on the console which is the following (copied by hand by an ons= ite tech so > there may be a typo here and there): >=20 > --------cut-------------- >=20 > bge_intr() at loge_intr+0x84a > ithread_loop() at ithread_loop+0x14c > fork_exit() at fork_exit+0xbb > fork_trampoline() at fork_trampoline+0xee > --- trap 0, rip-0, rsp-0xffffffffb371ad00, rbp-0 --- >=20 > Fatal trap 12: page fault while in Kernel Mode > cupid=3D1, apic id=3D01 > fault virtual address - 0x28 > fault code - supervisor write, page not present > instruction pointer - 0x8:0xffffffff801dae1a > stack pointer - 0x10:0xffffffffb371ab70 > frame pointer - 0x10:0xffffffffb371abd0 > code segment - base 0x0, limit 0xfffff, type 0x1b > - DPL 0, pres 1, long 1, def32 0, gram 1 >=20 > processor eflags=3Dinterrupt enabled, resume, IOPL=3D0 > current process=3D28 (irq 24:bge0) > trap number=3D12 > panic: page fault > cupid=3D1 >=20 > Uptime - 4d10h52m36s > Dumping 4031MB (2 chunks) > chunk0: 1MB (156 pages)... ok > chunk1: 4031MB (1031920) >=20 > ----------cut----------------- >=20 > For some reason, by the time it reboots, there is no dump file available = (even > though it is enabled in rc.conf and there is more than enough room in /va= r/crash to > hold it). This is indicating a problem either with your bge hardware or the driver. Kris --lc9FT7cWel8HagAv Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (FreeBSD) iD8DBQFFqTzaWry0BWjoQKURAs97AJ9rN6bEpgrZXO5s5UlqEQkWMfmKWgCg5NGe q7o5ea3SBuPKd/YdbZ0ZaWs= =ghem -----END PGP SIGNATURE----- --lc9FT7cWel8HagAv--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070113201106.GD66260>