From owner-freebsd-fs@FreeBSD.ORG Thu May 6 01:42:50 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9B0E11065675 for ; Thu, 6 May 2010 01:42:50 +0000 (UTC) (envelope-from staale@kristoffersen.ws) Received: from mail-forward.uio.no (mail-forward.uio.no [129.240.10.42]) by mx1.freebsd.org (Postfix) with ESMTP id 2505F8FC0C for ; Thu, 6 May 2010 01:42:49 +0000 (UTC) Received: from mail-mx2.uio.no ([129.240.10.30]) by pat.uio.no with esmtp (Exim 4.67) (envelope-from ) id 1O9pn8-0004u7-KY for freebsd-fs@freebsd.org; Thu, 06 May 2010 03:22:18 +0200 Received: from putsch.kolbu.ws ([158.36.191.193]) by mail-mx2.uio.no with esmtps (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1O9pn8-00036F-0z for freebsd-fs@freebsd.org; Thu, 06 May 2010 03:22:18 +0200 Received: from chiller by putsch.kolbu.ws with local (Exim 4.71 (FreeBSD)) (envelope-from ) id 1O9pn7-000BeM-Pn for freebsd-fs@freebsd.org; Thu, 06 May 2010 03:22:17 +0200 Date: Thu, 6 May 2010 03:22:17 +0200 From: =?iso-8859-1?Q?St=E5le?= Kristoffersen To: freebsd-fs@freebsd.org Message-ID: <20100506012217.GA41806@putsch.kolbu.ws> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.5.18 (2008-05-17) X-UiO-Spam-info: not spam, SpamAssassin (score=-5.0, required=5.0, autolearn=disabled, UIO_MAIL_IS_INTERNAL=-5, uiobl=NO, uiouri=NO) X-UiO-Scanned: 9366186A0607D88E4E8511B24A4A2ADE5E567E9D X-UiO-SPAM-Test: remote_host: 158.36.191.193 spam_score: -49 maxlevel 80 minaction 2 bait 0 mail/h: 1 total 584 max/h 11 blacklist 0 greylist 0 ratelimit 0 Subject: Bad hardware + zfs = panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 May 2010 01:42:50 -0000 I've been debugging a hardware error for the past few days, and I think it was the CPU and that it is now fixed. But reading a file that was written to a zfs-pool when stuff got corrupted still triggered a panic in ZFS code: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x28 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8106f2d3 stack pointer = 0x28:0xffffff80774914e0 frame pointer = 0x28:0xffffff8077491510 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 1350 (smbd) trap number = 12 panic: page fault cpuid = 0 Uptime: 2m53s The lines in the backtrace that got my attention was: #6 0xffffffff80847c73 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:224 #7 0xffffffff8106f2d3 in vdev_is_dead (vd=0x0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:1847 #8 0xffffffff8106f2ed in vdev_readable (vd=0x0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:1854 The complete bt is available here: http://heim.ifi.uio.no/staalebk/zfs-panic.txt As you can see vd=0x0, and I think that caused the panic, since it tried to follow that pointer: return (vd->vdev_state < VDEV_STATE_DEGRADED); I then tried to remove the file and I got this: Solaris: WARNING: metaslab_free_dva(): bad DVA 199476166:1296607792756162560 Solaris: WARNING: metaslab_free_dva(): bad DVA 4236221:7256850009726709760 Solaris: WARNING: metaslab_free_dva(): bad DVA 935912721:16480078061480073216 Maybe there should be a test to check if vd was zero, and throw an io-error or something, instead of panicing? I'm new to debugging kernels, so if what I'm typing makes no sense, just tell me. Kernel version is: FreeBSD fs2 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Tue Jan 5 21:11:58 UTC 2010 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 -- Ståle Kristoffersen