From owner-freebsd-current@FreeBSD.ORG Fri Oct 5 18:56:40 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9E00616A418 for ; Fri, 5 Oct 2007 18:56:40 +0000 (UTC) (envelope-from stevenschlansker@berkeley.edu) Received: from smtp-out1.berkeley.edu (smtp-out1.Berkeley.EDU [128.32.61.106]) by mx1.freebsd.org (Postfix) with ESMTP id 8149E13C4B9 for ; Fri, 5 Oct 2007 18:56:40 +0000 (UTC) (envelope-from stevenschlansker@berkeley.edu) Received: from 209-204-139-199.dsl.dynamic.sonic.net ([209.204.139.199] helo=[192.168.42.3]) by fe6.calmail with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.68) (auth plain:stevenschlansker@berkeley.edu) (envelope-from ) id 1IdsLo-0003Iz-KR for freebsd-current@freebsd.org; Fri, 05 Oct 2007 11:56:40 -0700 Message-ID: <470688E6.30900@berkeley.edu> Date: Fri, 05 Oct 2007 11:56:38 -0700 From: Steven Schlansker User-Agent: Thunderbird 2.0.0.6 (X11/20070924) MIME-Version: 1.0 To: freebsd-current@freebsd.org References: <4701FE7C.8020200@berkeley.edu> <20071002143044.GL1693@garage.freebsd.pl> <47028989.9080300@berkeley.edu> <4702A6DE.3080403@conducive.net> <86abqzqjrp.fsf@ds4.des.no> In-Reply-To: <86abqzqjrp.fsf@ds4.des.no> X-Enigmail-Version: 0.95.3 OpenPGP: id=40BFF7A7; url=subkeys.pgp.net Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Subject: Re: Repeatable kernel panic on -CURRENT using ZFS over SATA X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Oct 2007 18:56:40 -0000 Dag-Erling Smørgrav wrote: > Bill Hacker writes: >> Short answer - you are overstressing your very marginal hardware. > > You're completely off the mark. Steven is experiencing a well-known bug > in the ata driver. > > DES In case I can be helpful, I would still like to debug this problem. Please tell me if my constant whining at the list is constructive and helpful in tracing this bug down :) If it's not, I'd rather let you guys code than answer my emails, but if I can be of any help I am willing. Here's a dump that I captured using -CURRENT as of two nights ago: Dump header from device /dev/da0s1b Architecture: i386 Architecture Version: 2 Dump Length: 113577984B (108 MB) Blocksize: 512 Dumptime: Fri Oct 5 00:37:08 2007 Hostname: scotch.CSUA.Berkeley.EDU Magic: FreeBSD Kernel Dump Version String: FreeBSD 7.0-CURRENT #1: Thu Oct 4 06:23:40 PDT 2007 root@scotch.CSUA.Berkeley.EDU:/usr/obj/usr/src/sys/GENERIC Panic String: from debugger Dump Parity: 3604782152 Bounds: 2 Dump Status: good Unread portion of the kernel message buffer: ad12: FAILURE - device detached subdisk12: detached ad12: detached Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x2c fault code = supervisor read, page not present instruction pointer = 0x20:0xc07422d6 stack pointer = 0x28:0xd9e98c58 frame pointer = 0x28:0xd9e98c78 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 3 (g_up) panic: from debugger cpuid = 0 Uptime: 16m4s Physical memory: 499 MB Dumping 108 MB: 93 77 61 45 29 13 #0 doadump () at pcpu.h:195 195 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); (kgdb) bt #0 doadump () at pcpu.h:195 #1 0xc074d7ae in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0xc074da6b in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:563 #3 0xc048cab7 in db_panic (addr=Could not find the frame base for "db_panic". ) at /usr/src/sys/ddb/db_command.c:433 #4 0xc048d4a5 in db_command_loop () at /usr/src/sys/ddb/db_command.c:401 #5 0xc048ec15 in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:222 #6 0xc07746f6 in kdb_trap (type=12, code=0, tf=0xd9e98c18) at /usr/src/sys/kern/subr_kdb.c:502 #7 0xc0a01aaf in trap_fatal (frame=0xd9e98c18, eva=44) at /usr/src/sys/i386/i386/trap.c:863 #8 0xc0a01ce3 in trap_pfault (frame=0xd9e98c18, usermode=0, eva=44) at /usr/src/sys/i386/i386/trap.c:785 #9 0xc0a02695 in trap (frame=0xd9e98c18) at /usr/src/sys/i386/i386/trap.c:463 #10 0xc09e81fb in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #11 0xc07422d6 in _mtx_lock_flags (m=0x1c, opts=0, file=0xc31edd67 "/usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c", line=472) at /usr/src/sys/kern/kern_mutex.c:177 #12 0xc31e2fb4 in ?? () #13 0x0000001c in ?? () #14 0x00000000 in ?? () #15 0xc31edd67 in ?? () #16 0x000001d8 in ?? () #17 0xc788c5ac in ?? () #18 0xc31e2f70 in ?? () #19 0xc2d9c840 in ?? () #20 0xd9e98cbc in ?? () #21 0xc07b0d49 in biodone (bp=0x8) at /usr/src/sys/kern/vfs_bio.c:3009 Previous frame identical to this frame (corrupt stack?) (kgdb) list *0xc07422d6 0xc07422d6 is in _mtx_lock_flags (/usr/src/sys/kern/kern_mutex.c:178). 173 void 174 _mtx_lock_flags(struct mtx *m, int opts, const char *file, int line) 175 { 176 177 MPASS(curthread != NULL); 178 KASSERT(m->mtx_lock != MTX_DESTROYED, 179 ("mtx_lock() of destroyed mutex @ %s:%d", file, line)); 180 KASSERT(LOCK_CLASS(&m->lock_object) == &lock_class_mtx_sleep, 181 ("mtx_lock() of spin mutex %s @ %s:%d", m->lock_object.lo_name, 182 file, line)); (kgdb) list *0xc31e2fb4 No source file for address 0xc31e2fb4. (kgdb) list *0xc07b0d49 0xc07b0d49 is in biodone (/usr/src/sys/kern/vfs_bio.c:3010). 3005 if (done == NULL) 3006 wakeup(bp); 3007 mtx_unlock(&bdonelock); 3008 if (done != NULL) 3009 done(bp); 3010 } 3011 3012 /* 3013 * Wait for a BIO to finish. 3014 * Interestingly enough, I can't seem to get a useful backtrace... all of those ??? frames! Perhaps someone who knows more about kernel debugging than I can step me through from here. I read the kernel debugging section of the FreeBSD handbook, and it was not useful as to what to do if the stack is seemingly corrupt :) I also have a dump from a time when I hotplugged a SATA drive and it instantly paniced on me - usually this has been working, but that time it just gave up. Not sure how interesting this dump is though, haven't been able to reproduce it (granted I haven't tried very hard). -Steven