From owner-freebsd-current Sun Apr 28 21: 3:48 2002 Delivered-To: freebsd-current@freebsd.org Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by hub.freebsd.org (Postfix) with ESMTP id AFBF937B41D; Sun, 28 Apr 2002 21:03:36 -0700 (PDT) Received: from fledge.watson.org (fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.11.6/8.11.6) with SMTP id g3T43Ow29942; Mon, 29 Apr 2002 00:03:24 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Mon, 29 Apr 2002 00:03:23 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: current@FreeBSD.org Cc: jeff@FreeBSD.org Subject: Re: page fault in _mtx_lock_flags In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="0-260287369-1020053003=:64976" Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. --0-260287369-1020053003=:64976 Content-Type: TEXT/PLAIN; charset=US-ASCII If I apply the attached diff to the kern_malloc.c, backing out a portion of kern_malloc.c:1.99, the rate of panics plummets. Previously, I could have a box panic within five minutes of getting the crash boxes spinning. Now I've been going for about 40 minutes without any perceived failures (i.e., no panics). I have no idea why this fixes the problem, but David Wolfskill pointed me at that particular revision as being a source of related problems for him. I'm going to leave the boxes running overnight and see what I bump into. It would be nice to know if this is masking the problem, or fixing the problem, and if so, why. Robert N M Watson FreeBSD Core Team, TrustedBSD Project robert@fledge.watson.org NAI Labs, Safeport Network Services On Sun, 28 Apr 2002, Robert Watson wrote: > I also get an almost identical fault on crash1 involving mdconfig as > opposed to sh: > > ray irq 10 > NFS ROOT: 192.168.50.1:/cboss/devel/nfsroot/crash1.cboss.tislabs.com > 8.50.10 BroadcasP-Address 192.16 > t 192.168.50.255 > > Fatal trap 12: page fault while in kernel mode > cpuid = 1; lapic.id = 01000000 > fault virtual address = 0x6b73697c > fault code = supervisor write, page not present > instruction pointer = 0x8:0xc02449b6 > stack pointer = 0x10:0xc93d8a14 > frame pointer = 0x10:0xc93d8a20 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 44 (mdconfig) > kernel: type 12 trap, code=0 > Stopped at _mtx_lock_flags+0x42: lock cmpxchgl %ecx,0x18(%ebx) > db> trace > _mtx_lock_flags(6b736964,0,c03cb862,e3) at _mtx_lock_flags+0x42 > lockmgr(c93a8228,1000001,0,c8f27100) at lockmgr+0x42 > vfs_busy(c93a8200,0,0,c8f27100) at vfs_busy+0x58 > lookup(c93d8c28,0,c93b9c34,c93d8d20,c8f27100) at lookup+0x3a2 > namei(c93d8c28,0,c93b9c34,c93d8d20,0) at namei+0x1c8 > vn_open_cred(c93d8c28,c93d8bf4,0,c3f80c80,c93d8ce8) at vn_open_cred+0x23b > vn_open(c93d8c28,c93d8bf4,0,c8f271dc,c8f27000) at vn_open+0x18 > open(c8f27100,c93d8d20,0,0,0) at open+0x158 > syscall(2f,2f,2f,0,0) at syscall+0x223 > syscall_with_err_pushed() at syscall_with_err_pushed+0x1b > --- syscall (5, FreeBSD ELF, open), eip = 0x804950b, esp = 0xbfbffd14, ebp > = 0xbfbffd50 --- > db> Context switches not allowed in the debugger. > db> > > Still not clear what the origin of this is -- possibly memory corruption > of the mutex..? > > > Robert N M Watson FreeBSD Core Team, TrustedBSD Project > robert@fledge.watson.org NAI Labs, Safeport Network Services > > On Sun, 28 Apr 2002, Robert Watson wrote: > > > > > As usual, GENERIC -CURRENT head from last night, from the main tree. > > Dual-proc SMP box netbooted using PXE. System usually boots, does a > > buildkernel -j 8 over NFS, then reboots and repeats. This time it didn't. > > > > I actually have two boxes doing this, which does seem to double the rate > > of panics I get. > > > > APIC_IO: Testing 8254 interrupt delivery > > APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0 intpin 2 > > APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0 > > ad0: 19458MB [39535/16/63] at ata0-master UDMA33 > > acd0: CDROM at ata1-master PIO4 > > doSuMnPt:i nAgP rCoPoUt #f1r oLma unnfcsh:etsray irq 10 > > NFS ROOT: 192.168.50.1:/cboss/devel/nfsroot/crash1.cboss.tislabs.com > > > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 0; lapic.id = 00000000 > > fault virtual address = 0x7974748b > > fault code = supervisor write, page not present > > instruction pointer = 0x8:0xc02449b6 > > stack pointer = 0x10:0xc93dea14 > > frame pointer = 0x10:0xc93dea20 > > code segment = base 0x0, limit 0xfffff, type 0x1b > > = DPL 0, pres 1, def32 1, gran 1 > > processor eflags = interrupt enabled, resume, IOPL = 0 > > current process = 41 (sh) > > kernel: type 12 trap, code=0 > > Stopped at _mtx_lock_flags+0x42: lock cmpxchgl %ecx,0x18(%ebx) > > db> trace > > _mtx_lock_flags(79747473,0,c03cb862,e3) at _mtx_lock_flags+0x42 > > lockmgr(c93a8228,1000001,0,c8f27100) at lockmgr+0x42 > > vfs_busy(c93a8200,0,0,c8f27100) at vfs_busy+0x58 > > lookup(c93dec28,1a4,c8f03034,c93ded20,c8f27100) at lookup+0x3a2 > > namei(c93dec28,1a4,c8f03034,c93ded20,0) at namei+0x1c8 > > vn_open_cred(c93dec28,c93debf4,1a4,c3f80c80,c93dece8) at vn_open_cred+0x67 > > vn_open(c93dec28,c93debf4,1a4,c8f271dc,c8f27000) at vn_open+0x18 > > open(c8f27100,c93ded20,8125005,0,0) at open+0x158 > > syscall(2f,2f,2f,0,0) at syscall+0x223 > > syscall_with_err_pushed() at syscall_with_err_pushed+0x1b > > --- syscall (5, FreeBSD ELF, open), eip = 0x808969b, esp = 0xbfbff8f0, ebp > > = 0xbfbff91c --- > > db> > > > > (kgdb) l *_mtx_lock_flags+0x42 > > 0xc02449b6 is in _mtx_lock_flags (machine/atomic.h:139). > > 134 static __inline int > > 135 atomic_cmpset_int(volatile u_int *dst, u_int exp, u_int src) > > 136 { > > 137 int res = exp; > > 138 > > 139 __asm __volatile ( > > 140 " " __XSTRING(MPLOCKED) " " > > 141 " cmpxchgl %1,%2 ; " > > 142 " setz %%al ; " > > 143 " movzbl %%al,%0 ; " > > (gdb) l *lockmgr+0x42 > > 0xc0242376 is in lockmgr (../../../kern/kern_lock.c:228). > > 223 pid = LK_KERNPROC; > > 224 else > > 225 pid = td->td_proc->p_pid; > > 226 > > 227 mtx_lock(lkp->lk_interlock); > > 228 if (flags & LK_INTERLOCK) { > > 229 mtx_assert(interlkp, MA_OWNED | MA_NOTRECURSED); > > 230 mtx_unlock(interlkp); > > 231 } > > 232 > > > > Attempts to get into serial gdb failed: > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 1; lapic.id = 01000000 > > fault virtual address = 0x6aa > > fault code = supervisor read, page not present > > instruction pointer = 0x8:0xc93debf4 > > stack pointer = 0x10:0xc93debd4 > > frame pointer = 0x10:0xc93dec28 > > tokdke nselg trnatp 1=2 waith 0ixn0terlruptts 0dxisfablfed > > cpan ic: bblo ck a=b leP Lsle,epp rlosc k1 ,(sdleefep2 m1ut egx)a > > pro > > ssroclescsor e../a.g./ .=. /ii38e6/iu386 /etnraapl.cd:,7 11e > > pcmeu, I O=P L0 ;= l0 > > ccu.rrde =t 0p00o0000s0 > > "Deb1u g(gsehr)( > > $T0b08:f4eb3dc9;05:28ec3dc9;04:d4eb3dc9;#01~ > > > > I'm guessing that I'm dealing with an smp/locking issue there, but > > unfortunately I didn't get much further: > > > > (kgdb) target remote /dev/cuaa0 > > Remote debugging using /dev/cuaa0 > > 0xc93debf4 in ?? () > > (kgdb) bt > > #0 0xc93debf4 in ?? () > > #1 0x0 in ?? () > > > > Normally getting into serial gdb works OK, perhaps there's an interaction > > from the mutex code. > > > > Robert N M Watson FreeBSD Core Team, TrustedBSD Project > > robert@fledge.watson.org NAI Labs, Safeport Network Services > > > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > > with "unsubscribe freebsd-current" in the body of the message > > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-current" in the body of the message > --0-260287369-1020053003=:64976 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="kern_malloc.c.diff" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: kern_malloc.c.diff SW5kZXg6IGtlcm5fbWFsbG9jLmMNCj09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0N ClJDUyBmaWxlOiAvaG9tZS9uY3ZzL3NyYy9zeXMva2Vybi9rZXJuX21hbGxv Yy5jLHYNCnJldHJpZXZpbmcgcmV2aXNpb24gMS4xMDANCmRpZmYgLXUgLXIx LjEwMCBrZXJuX21hbGxvYy5jDQotLS0ga2Vybl9tYWxsb2MuYwkyMyBBcHIg MjAwMiAxODo1MDoyNSAtMDAwMAkxLjEwMA0KKysrIGtlcm5fbWFsbG9jLmMJ MjkgQXByIDIwMDIgMDM6NTg6MDkgLTAwMDANCkBAIC05MCw3ICs5MCw3IEBA DQogI2RlZmluZSBLTUVNX1pCQVNFCTE2DQogI2RlZmluZSBLTUVNX1pNQVNL CShLTUVNX1pCQVNFIC0gMSkNCiANCi0jZGVmaW5lIEtNRU1fWk1BWAk4MTky DQorI2RlZmluZSBLTUVNX1pNQVgJNjU1MzYNCiAjZGVmaW5lIEtNRU1fWlNJ WkUJKEtNRU1fWk1BWCA+PiBLTUVNX1pTSElGVCkNCiBzdGF0aWMgdV9pbnQ4 X3Qga21lbXNpemVbS01FTV9aU0laRSArIDFdOw0KIA0KQEAgLTExMCw2ICsx MTAsOCBAQA0KIAl7MjA0OCwgIjIwNDgiLCBOVUxMfSwNCiAJezQwOTYsICI0 MDk2IiwgTlVMTH0sDQogCXs4MTkyLCAiODE5MiIsIE5VTEx9LA0KKwl7MzI3 NjgsICIzMjc2OCIsIE5VTEx9LA0KKwl7NjU1MzYsICI2NTUzNiIsIE5VTEx9 LA0KIAl7MCwgTlVMTH0sDQogfTsNCiANCg== --0-260287369-1020053003=:64976-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message