Date: Fri, 05 Oct 2007 11:56:38 -0700 From: Steven Schlansker <stevenschlansker@berkeley.edu> To: freebsd-current@freebsd.org Subject: Re: Repeatable kernel panic on -CURRENT using ZFS over SATA Message-ID: <470688E6.30900@berkeley.edu> In-Reply-To: <86abqzqjrp.fsf@ds4.des.no> References: <4701FE7C.8020200@berkeley.edu> <20071002143044.GL1693@garage.freebsd.pl> <47028989.9080300@berkeley.edu> <4702A6DE.3080403@conducive.net> <86abqzqjrp.fsf@ds4.des.no>
next in thread | previous in thread | raw e-mail | index | archive | help
Dag-Erling Smørgrav wrote: > Bill Hacker <askbill@conducive.net> writes: >> Short answer - you are overstressing your very marginal hardware. > > You're completely off the mark. Steven is experiencing a well-known bug > in the ata driver. > > DES In case I can be helpful, I would still like to debug this problem. Please tell me if my constant whining at the list is constructive and helpful in tracing this bug down :) If it's not, I'd rather let you guys code than answer my emails, but if I can be of any help I am willing. Here's a dump that I captured using -CURRENT as of two nights ago: Dump header from device /dev/da0s1b Architecture: i386 Architecture Version: 2 Dump Length: 113577984B (108 MB) Blocksize: 512 Dumptime: Fri Oct 5 00:37:08 2007 Hostname: scotch.CSUA.Berkeley.EDU Magic: FreeBSD Kernel Dump Version String: FreeBSD 7.0-CURRENT #1: Thu Oct 4 06:23:40 PDT 2007 root@scotch.CSUA.Berkeley.EDU:/usr/obj/usr/src/sys/GENERIC Panic String: from debugger Dump Parity: 3604782152 Bounds: 2 Dump Status: good Unread portion of the kernel message buffer: ad12: FAILURE - device detached subdisk12: detached ad12: detached Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x2c fault code = supervisor read, page not present instruction pointer = 0x20:0xc07422d6 stack pointer = 0x28:0xd9e98c58 frame pointer = 0x28:0xd9e98c78 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 3 (g_up) panic: from debugger cpuid = 0 Uptime: 16m4s Physical memory: 499 MB Dumping 108 MB: 93 77 61 45 29 13 #0 doadump () at pcpu.h:195 195 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); (kgdb) bt #0 doadump () at pcpu.h:195 #1 0xc074d7ae in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0xc074da6b in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:563 #3 0xc048cab7 in db_panic (addr=Could not find the frame base for "db_panic". ) at /usr/src/sys/ddb/db_command.c:433 #4 0xc048d4a5 in db_command_loop () at /usr/src/sys/ddb/db_command.c:401 #5 0xc048ec15 in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:222 #6 0xc07746f6 in kdb_trap (type=12, code=0, tf=0xd9e98c18) at /usr/src/sys/kern/subr_kdb.c:502 #7 0xc0a01aaf in trap_fatal (frame=0xd9e98c18, eva=44) at /usr/src/sys/i386/i386/trap.c:863 #8 0xc0a01ce3 in trap_pfault (frame=0xd9e98c18, usermode=0, eva=44) at /usr/src/sys/i386/i386/trap.c:785 #9 0xc0a02695 in trap (frame=0xd9e98c18) at /usr/src/sys/i386/i386/trap.c:463 #10 0xc09e81fb in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #11 0xc07422d6 in _mtx_lock_flags (m=0x1c, opts=0, file=0xc31edd67 "/usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c", line=472) at /usr/src/sys/kern/kern_mutex.c:177 #12 0xc31e2fb4 in ?? () #13 0x0000001c in ?? () #14 0x00000000 in ?? () #15 0xc31edd67 in ?? () #16 0x000001d8 in ?? () #17 0xc788c5ac in ?? () #18 0xc31e2f70 in ?? () #19 0xc2d9c840 in ?? () #20 0xd9e98cbc in ?? () #21 0xc07b0d49 in biodone (bp=0x8) at /usr/src/sys/kern/vfs_bio.c:3009 Previous frame identical to this frame (corrupt stack?) (kgdb) list *0xc07422d6 0xc07422d6 is in _mtx_lock_flags (/usr/src/sys/kern/kern_mutex.c:178). 173 void 174 _mtx_lock_flags(struct mtx *m, int opts, const char *file, int line) 175 { 176 177 MPASS(curthread != NULL); 178 KASSERT(m->mtx_lock != MTX_DESTROYED, 179 ("mtx_lock() of destroyed mutex @ %s:%d", file, line)); 180 KASSERT(LOCK_CLASS(&m->lock_object) == &lock_class_mtx_sleep, 181 ("mtx_lock() of spin mutex %s @ %s:%d", m->lock_object.lo_name, 182 file, line)); (kgdb) list *0xc31e2fb4 No source file for address 0xc31e2fb4. (kgdb) list *0xc07b0d49 0xc07b0d49 is in biodone (/usr/src/sys/kern/vfs_bio.c:3010). 3005 if (done == NULL) 3006 wakeup(bp); 3007 mtx_unlock(&bdonelock); 3008 if (done != NULL) 3009 done(bp); 3010 } 3011 3012 /* 3013 * Wait for a BIO to finish. 3014 * Interestingly enough, I can't seem to get a useful backtrace... all of those ??? frames! Perhaps someone who knows more about kernel debugging than I can step me through from here. I read the kernel debugging section of the FreeBSD handbook, and it was not useful as to what to do if the stack is seemingly corrupt :) I also have a dump from a time when I hotplugged a SATA drive and it instantly paniced on me - usually this has been working, but that time it just gave up. Not sure how interesting this dump is though, haven't been able to reproduce it (granted I haven't tried very hard). -Steven
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?470688E6.30900>