From owner-freebsd-fs@FreeBSD.ORG  Thu May  6 01:42:50 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 9B0E11065675
	for <freebsd-fs@freebsd.org>; Thu,  6 May 2010 01:42:50 +0000 (UTC)
	(envelope-from staale@kristoffersen.ws)
Received: from mail-forward.uio.no (mail-forward.uio.no [129.240.10.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 2505F8FC0C
	for <freebsd-fs@freebsd.org>; Thu,  6 May 2010 01:42:49 +0000 (UTC)
Received: from mail-mx2.uio.no ([129.240.10.30])
	by pat.uio.no with esmtp (Exim 4.67)
	(envelope-from <staale@kristoffersen.ws>) id 1O9pn8-0004u7-KY
	for freebsd-fs@freebsd.org; Thu, 06 May 2010 03:22:18 +0200
Received: from putsch.kolbu.ws ([158.36.191.193])
	by mail-mx2.uio.no with esmtps (TLSv1:AES256-SHA:256) (Exim 4.69)
	(envelope-from <staale@kristoffersen.ws>) id 1O9pn8-00036F-0z
	for freebsd-fs@freebsd.org; Thu, 06 May 2010 03:22:18 +0200
Received: from chiller by putsch.kolbu.ws with local (Exim 4.71 (FreeBSD))
	(envelope-from <staale@kristoffersen.ws>) id 1O9pn7-000BeM-Pn
	for freebsd-fs@freebsd.org; Thu, 06 May 2010 03:22:17 +0200
Date: Thu, 6 May 2010 03:22:17 +0200
From: =?iso-8859-1?Q?St=E5le?= Kristoffersen <staale@kristoffersen.ws>
To: freebsd-fs@freebsd.org
Message-ID: <20100506012217.GA41806@putsch.kolbu.ws>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
User-Agent: Mutt/1.5.18 (2008-05-17)
X-UiO-Spam-info: not spam, SpamAssassin (score=-5.0, required=5.0,
	autolearn=disabled, UIO_MAIL_IS_INTERNAL=-5, uiobl=NO,
	uiouri=NO)
X-UiO-Scanned: 9366186A0607D88E4E8511B24A4A2ADE5E567E9D
X-UiO-SPAM-Test: remote_host: 158.36.191.193 spam_score: -49 maxlevel 80
	minaction 2 bait 0 mail/h: 1 total 584 max/h 11 blacklist 0
	greylist 0 ratelimit 0
Subject: Bad hardware + zfs = panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 06 May 2010 01:42:50 -0000

I've been debugging a hardware error for the past few days, and I think it
was the CPU and that it is now fixed. But reading a file that was written to a
zfs-pool when stuff got corrupted still triggered a panic in ZFS code:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x28
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff8106f2d3
stack pointer           = 0x28:0xffffff80774914e0
frame pointer           = 0x28:0xffffff8077491510
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 1350 (smbd)
trap number             = 12
panic: page fault
cpuid = 0
Uptime: 2m53s

The lines in the backtrace that got my attention was:
#6  0xffffffff80847c73 in calltrap () at
/usr/src/sys/amd64/amd64/exception.S:224
#7  0xffffffff8106f2d3 in vdev_is_dead (vd=0x0) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:1847
#8  0xffffffff8106f2ed in vdev_readable (vd=0x0) at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:1854

The complete bt is available here:
http://heim.ifi.uio.no/staalebk/zfs-panic.txt

As you can see vd=0x0, and I think that caused the panic, since it
tried to follow that pointer:
 return (vd->vdev_state < VDEV_STATE_DEGRADED);

I then tried to remove the file and I got this:
Solaris: WARNING: metaslab_free_dva(): bad DVA
199476166:1296607792756162560
Solaris: WARNING: metaslab_free_dva(): bad DVA 4236221:7256850009726709760
Solaris: WARNING: metaslab_free_dva(): bad DVA
935912721:16480078061480073216

Maybe there should be a test to check if vd was zero, and
throw an io-error or something, instead of panicing?

I'm new to debugging kernels, so if what I'm typing makes no sense, just
tell me.

Kernel version is:
FreeBSD fs2 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Tue Jan
5 21:11:58 UTC 2010
root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

-- 
Ståle Kristoffersen