From owner-freebsd-current@FreeBSD.ORG  Fri Oct  5 18:56:40 2007
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9E00616A418
	for <freebsd-current@freebsd.org>; Fri,  5 Oct 2007 18:56:40 +0000 (UTC)
	(envelope-from stevenschlansker@berkeley.edu)
Received: from smtp-out1.berkeley.edu (smtp-out1.Berkeley.EDU [128.32.61.106])
	by mx1.freebsd.org (Postfix) with ESMTP id 8149E13C4B9
	for <freebsd-current@freebsd.org>; Fri,  5 Oct 2007 18:56:40 +0000 (UTC)
	(envelope-from stevenschlansker@berkeley.edu)
Received: from 209-204-139-199.dsl.dynamic.sonic.net ([209.204.139.199]
	helo=[192.168.42.3])
	by fe6.calmail with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.68)
	(auth plain:stevenschlansker@berkeley.edu)
	(envelope-from <stevenschlansker@berkeley.edu>) id 1IdsLo-0003Iz-KR
	for freebsd-current@freebsd.org; Fri, 05 Oct 2007 11:56:40 -0700
Message-ID: <470688E6.30900@berkeley.edu>
Date: Fri, 05 Oct 2007 11:56:38 -0700
From: Steven Schlansker <stevenschlansker@berkeley.edu>
User-Agent: Thunderbird 2.0.0.6 (X11/20070924)
MIME-Version: 1.0
To: freebsd-current@freebsd.org
References: <4701FE7C.8020200@berkeley.edu>	<20071002143044.GL1693@garage.freebsd.pl>	<47028989.9080300@berkeley.edu>
	<4702A6DE.3080403@conducive.net> <86abqzqjrp.fsf@ds4.des.no>
In-Reply-To: <86abqzqjrp.fsf@ds4.des.no>
X-Enigmail-Version: 0.95.3
OpenPGP: id=40BFF7A7;
	url=subkeys.pgp.net
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Subject: Re: Repeatable kernel panic on -CURRENT using ZFS over SATA
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Oct 2007 18:56:40 -0000

Dag-Erling Smørgrav wrote:
> Bill Hacker <askbill@conducive.net> writes:
>> Short answer - you are overstressing your very marginal hardware.
> 
> You're completely off the mark.  Steven is experiencing a well-known bug
> in the ata driver.
> 
> DES


In case I can be helpful, I would still like to debug this problem.


Please tell me if my constant whining at the list is constructive and
helpful in tracing this bug down :)
If it's not, I'd rather let you guys code than answer my emails, but if
I can be of any help I am willing.

Here's a dump that I captured using -CURRENT as of two nights ago:

Dump header from device /dev/da0s1b
  Architecture: i386
  Architecture Version: 2
  Dump Length: 113577984B (108 MB)
  Blocksize: 512
  Dumptime: Fri Oct  5 00:37:08 2007
  Hostname: scotch.CSUA.Berkeley.EDU
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 7.0-CURRENT #1: Thu Oct  4 06:23:40 PDT 2007
    root@scotch.CSUA.Berkeley.EDU:/usr/obj/usr/src/sys/GENERIC
  Panic String: from debugger
  Dump Parity: 3604782152
  Bounds: 2
  Dump Status: good


Unread portion of the kernel message buffer:
ad12: FAILURE - device detached
subdisk12: detached
ad12: detached


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x2c
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc07422d6
stack pointer           = 0x28:0xd9e98c58
frame pointer           = 0x28:0xd9e98c78
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 3 (g_up)
panic: from debugger
cpuid = 0
Uptime: 16m4s
Physical memory: 499 MB
Dumping 108 MB: 93 77 61 45 29 13

#0  doadump () at pcpu.h:195
195             __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) bt
#0  doadump () at pcpu.h:195
#1  0xc074d7ae in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2  0xc074da6b in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:563
#3  0xc048cab7 in db_panic (addr=Could not find the frame base for
"db_panic".
) at /usr/src/sys/ddb/db_command.c:433
#4  0xc048d4a5 in db_command_loop () at /usr/src/sys/ddb/db_command.c:401
#5  0xc048ec15 in db_trap (type=12, code=0) at
/usr/src/sys/ddb/db_main.c:222
#6  0xc07746f6 in kdb_trap (type=12, code=0, tf=0xd9e98c18) at
/usr/src/sys/kern/subr_kdb.c:502
#7  0xc0a01aaf in trap_fatal (frame=0xd9e98c18, eva=44) at
/usr/src/sys/i386/i386/trap.c:863
#8  0xc0a01ce3 in trap_pfault (frame=0xd9e98c18, usermode=0, eva=44) at
/usr/src/sys/i386/i386/trap.c:785
#9  0xc0a02695 in trap (frame=0xd9e98c18) at
/usr/src/sys/i386/i386/trap.c:463
#10 0xc09e81fb in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#11 0xc07422d6 in _mtx_lock_flags (m=0x1c, opts=0,
    file=0xc31edd67
"/usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c",
line=472)
    at /usr/src/sys/kern/kern_mutex.c:177
#12 0xc31e2fb4 in ?? ()
#13 0x0000001c in ?? ()
#14 0x00000000 in ?? ()
#15 0xc31edd67 in ?? ()
#16 0x000001d8 in ?? ()
#17 0xc788c5ac in ?? ()
#18 0xc31e2f70 in ?? ()
#19 0xc2d9c840 in ?? ()
#20 0xd9e98cbc in ?? ()
#21 0xc07b0d49 in biodone (bp=0x8) at /usr/src/sys/kern/vfs_bio.c:3009
Previous frame identical to this frame (corrupt stack?)
(kgdb) list *0xc07422d6
0xc07422d6 is in _mtx_lock_flags (/usr/src/sys/kern/kern_mutex.c:178).
173     void
174     _mtx_lock_flags(struct mtx *m, int opts, const char *file, int line)
175     {
176
177             MPASS(curthread != NULL);
178             KASSERT(m->mtx_lock != MTX_DESTROYED,
179                 ("mtx_lock() of destroyed mutex @ %s:%d", file, line));
180             KASSERT(LOCK_CLASS(&m->lock_object) ==
&lock_class_mtx_sleep,
181                 ("mtx_lock() of spin mutex %s @ %s:%d",
m->lock_object.lo_name,
182                 file, line));
(kgdb) list *0xc31e2fb4
No source file for address 0xc31e2fb4.
(kgdb) list *0xc07b0d49
0xc07b0d49 is in biodone (/usr/src/sys/kern/vfs_bio.c:3010).
3005            if (done == NULL)
3006                    wakeup(bp);
3007            mtx_unlock(&bdonelock);
3008            if (done != NULL)
3009                    done(bp);
3010    }
3011
3012    /*
3013     * Wait for a BIO to finish.
3014     *


Interestingly enough, I can't seem to get a useful backtrace...  all of
those ??? frames!

Perhaps someone who knows more about kernel debugging than I can step me
through from here.  I read the kernel debugging section of the FreeBSD
handbook, and it was not useful as to what to do if the stack is
seemingly corrupt :)

I also have a dump from a time when I hotplugged a SATA drive and it
instantly paniced on me - usually this has been working, but that time
it just gave up.  Not sure how interesting this dump is though, haven't
been able to reproduce it (granted I haven't tried very hard).

-Steven