From owner-freebsd-current Tue Jul 24 16:21:31 2001 Delivered-To: freebsd-current@freebsd.org Received: from meow.osd.bsdi.com (meow.osd.bsdi.com [204.216.28.88]) by hub.freebsd.org (Postfix) with ESMTP id 1EE1937B406 for ; Tue, 24 Jul 2001 16:21:22 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Received: from laptop.baldwin.cx (john@jhb-laptop.osd.bsdi.com [204.216.28.241]) by meow.osd.bsdi.com (8.11.4/8.11.2) with ESMTP id f6ONL9v40647; Tue, 24 Jul 2001 16:21:09 -0700 (PDT) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Tue, 24 Jul 2001 16:21:07 -0700 (PDT) From: John Baldwin To: Julian Elischer Subject: RE: This look familiar to anyone? (bug in 4.11 maybe) Cc: freebsd-current@FreeBSD.org Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 24-Jul-01 Julian Elischer wrote: > > In a VFS operation, %ecx get's corrupted (maybe from an interrupt?) > betweeen the instruction where it's loaded with a constant, > and the instruction where it's used... It'always the same instruction, > though often in DIFFERENT VFS instructions (fsync, bwrite so far) > > the trap frame usually looks like: > >#4 0xc0251813 in trap (frame={tf_fs = 0x10, tf_es = 0x10, tf_ds = 0x10, > tf_edi = 0x0, tf_esi = 0x1, tf_ebp = 0xc954de84, > tf_isp = 0xc954de50, tf_ebx = 0xc27d6d80, tf_edx = 0xc1344600, > tf_ecx = 0xc96145b2, tf_eax = 0xc954de78, tf_trapno = 0xc, > tf_err = 0x0, tf_eip = 0xc01846d9, tf_cs = 0x8, tf_eflags = 0x10286, > tf_esp = 0xc954de78, tf_ss = 0xc27d6d80}) > at /usr/src/sys/i386/i386/trap.c:443 >#5 0xc01846d9 in bwrite (bp=0xc27d6d80) at vnode_if.h:923 >#6 0xc0189be2 in vop_stdbwrite (ap=0xc954deb4) at > /usr/src/sys/kern/vfs_default.c:319 > > > the code there looks like: > > (kgdb) up 5 >#5 0xc01846d9 in bwrite (bp=0xc27d6d80) at vnode_if.h:923 > 923 rc = VCALL(vp, VOFFSET(vop_strategy), &a); > (kgdb) list > 918 struct vop_strategy_args a; > 919 int rc; > 920 a.a_desc = VDESC(vop_strategy); > 921 a.a_vp = vp; > 922 a.a_bp = bp; > 923 rc = VCALL(vp, VOFFSET(vop_strategy), &a); <-------here > 924 return (rc); > 925 } > 926 struct vop_print_args { > 927 struct vnodeop_desc *a_desc; > > In Assembler: > > 0xc01846cc : mov 0xc029dcc0,%ecx > 0xc01846d2 : mov 0x18(%eax),%edx > 0xc01846d5 : lea 0xfffffff4(%ebp),%eax > 0xc01846d8 : push %eax > 0xc01846d9 : mov (%edx,%ecx,4),%eax <<<<< **POW** > 0xc01846dc : call *%eax > 0xc01846de : add $0x4,%esp > 0xc01846e1 : mov 0xfffffff0(%ebp),%eax > > looking at the regs, > dx = 0xc1344600, > cx = 0xc96145b2, > and > C1344600+(4*C96145B2) = 3E6B95CC8 > the lower 32 bits of which is the same as the fault address > > but in the code above we see that %cx was just loaded from > location 0xc029dcc0 which contains: > (kgdb) x/x 0xc029dcc0 > 0xc029dcc0 : 0x12 > > 0x12 is the correct offset for a strategy call. > > so cx got corrupted between the instruction at 0xc01846cc > and that at 0xc01846d9. Very weird. Note that traps and interrupts will save %ecx in the trapframe, so you aren't going to end up with those getting corrupted unless we somehow screw up ecx after popping the frame (or before pushing it). > Note that the contents of cx (0xc96145b2) is an address > somewhat higher than the kernel stack at the time in question. Could be a stack of some other thread. All the 0xc9XXXXX addresses are pointers to automatic variables. The 0xc0[2-4]XXXXX are return addresses. > a dump of ram in that area shows: > (kgdb) x/64xw 0xc96145a0 > 0xc96145a0: 0xc954e900 0xc9709c00 0x00000000 0xc96145a8 > 0xc96145b0: [0xc9580660] 0xc95c7370 0xc04d7504 0xc04d47d4 > 0xc96145c0: 0x0000aa26 0x00000020 0x00000000 0x00000000 > 0xc96145d0: 0xfc812c38 0x00000002 0x00040010 0x00000020 > 0xc96145e0: 0x00000000 0x00000000 0x00000000 0x00000000 > 0xc96145f0: 0x00000000 0xc9636a40 0x0001fc93 0x00000000 > 0xc9614600: 0xc02ed7c0 0xc95b4120 0x00000000 0xc9614608 > 0xc9614610: 0x00000000 0xc9555548 0x00000000 0xc9614618 > 0xc9614620: 0x00003f5b 0x00000003 0x00000000 0x00000000 > 0xc9614630: 0xfe37c115 0x21880000 0x0000000e 0x00000000 > 0xc9614640: 0x00000000 0x00000000 0x00000000 0x00000000 > 0xc9614650: 0x00000000 0x00000000 0x00000000 0x00000000 > 0xc9614660: 0xc9722ae0 0xc961c600 0x00000000 0xc9614668 > 0xc9614670: 0xc9690660 0xc97091f0 0x00000000 0xc9614678 > 0xc9614680: 0x0000cabf 0x00000012 0x00000000 0x00000000 > 0xc9614690: 0xfc8189f2 0x00000002 0x0000001d 0x00000000 > > This is obviously SOMETHING, but what? And why does %cx point HALF WAY > THROUGH an obvious 32 bit pointer? > > Thoughts of hardware problems do come to mind... but.. Is it just one machine that does this reliably? -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message