Date: Mon, 15 Jan 2007 11:33:33 -0500 From: Sven Willenberger <sven@dmv.com> To: Kris Kennaway <kris@obsecurity.org> Cc: Kostik Belousov <kostikbel@gmail.com>, stable@freebsd.org Subject: Re: Not panic in nfsd (Re: panic in nfsd on 6.2-RC1) Message-ID: <1168878813.13113.2.camel@lanshark.dmv.com> In-Reply-To: <20070113201106.GD66260@xor.obsecurity.org> References: <20061205.004323.78708386.hrs@allbsd.org> <20061204160949.GM35681@deviant.kiev.zoral.com.ua> <20061205.123805.59655403.hrs@allbsd.org> <1166194879.6317.11.camel@lanshark.dmv.com> <20061215181548.GA58555@xor.obsecurity.org> <1166209936.6317.21.camel@lanshark.dmv.com> <20061215192958.GA86926@xor.obsecurity.org> <20061215212040.GG23698@deviant.kiev.zoral.com.ua> <1166463200.11562.5.camel@lanshark.dmv.com> <4596F06D.30004@dmv.com> <20070113201106.GD66260@xor.obsecurity.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 2007-01-13 at 15:11 -0500, Kris Kennaway wrote: > On Sat, Dec 30, 2006 at 06:04:13PM -0500, Sven Willenberger wrote: > > > > > > Sven Willenberger presumably uttered the following on 12/18/06 12:33: > > > On Fri, 2006-12-15 at 23:20 +0200, Kostik Belousov wrote: > > >> On Fri, Dec 15, 2006 at 02:29:58PM -0500, Kris Kennaway wrote: > > > > > > <<SNIP>> > > > > > >>> > > >>>> FWIW, I do see the following appearing in the /var/log/messages: > > >>>> ufs_rename: fvp == tvp (can't happen) > > >>>> about once or twice a day, but cannot correlate those to lockup. Now > > >>>> that I have enabled the options mentioned above in the kernel, I am > > >>>> seeing some LOR issues: > > >>>> > > >>>> kernel: lock order reversal: > > >>>> kernel: 1st 0xffffff00c3bab200 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1547 > > >>>> kernel: 2nd 0xffffff0005bb6078 struct mount mtx (struct mount mtx) @ /usr/src/sys/ufs/ufs/ufs_vnops.c:138 > > >>> OK, this is interesting, so let's proceed from here. > > >>> > > >>> Kris > > >> Try this. > > >> > > >> Index: ufs/ufs/ufs_vnops.c > > >> =================================================================== > > >> RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v > > >> retrieving revision 1.283 > > >> diff -u -r1.283 ufs_vnops.c > > >> --- ufs/ufs/ufs_vnops.c 6 Nov 2006 13:42:09 -0000 1.283 > > >> +++ ufs/ufs/ufs_vnops.c 15 Dec 2006 21:19:51 -0000 > > >> @@ -133,19 +133,15 @@ > > >> { > > >> struct inode *ip; > > >> struct timespec ts; > > >> - int mnt_locked; > > >> > > >> ip = VTOI(vp); > > >> - mnt_locked = 0; > > >> - if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0) { > > >> - VI_LOCK(vp); > > >> + VI_LOCK(vp); > > >> + if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0) > > >> goto out; > > >> + if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) { > > >> + VI_UNLOCK(vp); > > >> + return; > > >> } > > >> - MNT_ILOCK(vp->v_mount); /* For reading of mnt_kern_flags. */ > > >> - mnt_locked = 1; > > >> - VI_LOCK(vp); > > >> - if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) > > >> - goto out_unl; > > >> > > >> if ((vp->v_type == VBLK || vp->v_type == VCHR) && !DOINGSOFTDEP(vp)) > > >> ip->i_flag |= IN_LAZYMOD; > > >> @@ -172,10 +168,7 @@ > > >> > > >> out: > > >> ip->i_flag &= ~(IN_ACCESS | IN_CHANGE | IN_UPDATE); > > >> - out_unl: > > >> VI_UNLOCK(vp); > > >> - if (mnt_locked) > > >> - MNT_IUNLOCK(vp->v_mount); > > >> } > > >> > > >> /* > > > > > > > > > Patch applied cleanly (offset 6 lines), make buildworld, make kernel, > > > reboot, make installworld, etc. > > > > > > kernel: lock order reversal: > > > kernel: 1st 0xffffff00b9181800 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1547 > > > kernel: 2nd 0xffffff00c16030d0 vnode interlock (vnode interlock) @ /usr/src/sys/ufs/ufs/ufs_vnops.c:132 > > > > > > > > > > > > _______________________________________________ > > > > Having enabled witness and ddb, etc I cannot get this LOR to trigger anymore, but > > the machine is still locking up. I finally managed to get a piece of what was > > appearing on the console which is the following (copied by hand by an onsite tech so > > there may be a typo here and there): > > > > --------cut-------------- > > > > bge_intr() at loge_intr+0x84a > > ithread_loop() at ithread_loop+0x14c > > fork_exit() at fork_exit+0xbb > > fork_trampoline() at fork_trampoline+0xee > > --- trap 0, rip-0, rsp-0xffffffffb371ad00, rbp-0 --- > > > > Fatal trap 12: page fault while in Kernel Mode > > cupid=1, apic id=01 > > fault virtual address - 0x28 > > fault code - supervisor write, page not present > > instruction pointer - 0x8:0xffffffff801dae1a > > stack pointer - 0x10:0xffffffffb371ab70 > > frame pointer - 0x10:0xffffffffb371abd0 > > code segment - base 0x0, limit 0xfffff, type 0x1b > > - DPL 0, pres 1, long 1, def32 0, gram 1 > > > > processor eflags=interrupt enabled, resume, IOPL=0 > > current process=28 (irq 24:bge0) > > trap number=12 > > panic: page fault > > cupid=1 > > > > Uptime - 4d10h52m36s > > Dumping 4031MB (2 chunks) > > chunk0: 1MB (156 pages)... ok > > chunk1: 4031MB (1031920) > > > > ----------cut----------------- > > > > For some reason, by the time it reboots, there is no dump file available (even > > though it is enabled in rc.conf and there is more than enough room in /var/crash to > > hold it). > > This is indicating a problem either with your bge hardware or the driver. > > Kris I suspect the driver: This same hardware setup was being used as a databse server with FreeBSD 5.4. I had been using the bge driver but set at base100T without any issue at all. It was when I did a clean install of 6.2-Prerelease and setting bge to use the full gigE speed (via autonegotiate) that these issues cropped up. Since changing to the onboard fxp interface I have (knock on wood) not had an issue in some 7 days (as opposed to the random lockups/reboots occuring every 3 days using bge). Sven
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1168878813.13113.2.camel>