From owner-freebsd-fs@FreeBSD.ORG Wed May 12 11:08:19 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id BB3ED106567C for ; Wed, 12 May 2010 11:08:19 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (chello089077043238.chello.pl [89.77.43.238]) by mx1.freebsd.org (Postfix) with ESMTP id 251898FC12 for ; Wed, 12 May 2010 11:08:18 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id A21E945CDC; Wed, 12 May 2010 13:08:17 +0200 (CEST) Received: from localhost (pdawidek.wheel.pl [10.0.1.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id EBEF345C9F; Wed, 12 May 2010 13:08:10 +0200 (CEST) Date: Wed, 12 May 2010 13:08:03 +0200 From: Pawel Jakub Dawidek To: =?iso-8859-1?Q?St=E5le?= Kristoffersen Message-ID: <20100512110803.GF1703@garage.freebsd.pl> References: <20100506012217.GA41806@putsch.kolbu.ws> <20100512102156.GE1703@garage.freebsd.pl> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="9ADF8FXzFeE7X4jE" Content-Disposition: inline In-Reply-To: <20100512102156.GE1703@garage.freebsd.pl> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 9.0-CURRENT amd64 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-5.9 required=4.5 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.0.4 Cc: freebsd-fs@freebsd.org Subject: Re: Bad hardware + zfs = panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 May 2010 11:08:19 -0000 --9ADF8FXzFeE7X4jE Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, May 12, 2010 at 12:21:56PM +0200, Pawel Jakub Dawidek wrote: > On Thu, May 06, 2010 at 03:22:17AM +0200, St=E5le Kristoffersen wrote: > > I've been debugging a hardware error for the past few days, and I think= it > > was the CPU and that it is now fixed. But reading a file that was writt= en to a > > zfs-pool when stuff got corrupted still triggered a panic in ZFS code: > >=20 > > Fatal trap 12: page fault while in kernel mode > > cpuid =3D 0; apic id =3D 00 > > fault virtual address =3D 0x28 > > fault code =3D supervisor read data, page not present > > instruction pointer =3D 0x20:0xffffffff8106f2d3 > > stack pointer =3D 0x28:0xffffff80774914e0 > > frame pointer =3D 0x28:0xffffff8077491510 > > code segment =3D base 0x0, limit 0xfffff, type 0x1b > > =3D DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags =3D interrupt enabled, resume, IOPL =3D 0 > > current process =3D 1350 (smbd) > > trap number =3D 12 > > panic: page fault > > cpuid =3D 0 > > Uptime: 2m53s > >=20 > > The lines in the backtrace that got my attention was: > > #6 0xffffffff80847c73 in calltrap () at > > /usr/src/sys/amd64/amd64/exception.S:224 > > #7 0xffffffff8106f2d3 in vdev_is_dead (vd=3D0x0) at > > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/z= fs/vdev.c:1847 > > #8 0xffffffff8106f2ed in vdev_readable (vd=3D0x0) at > > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/z= fs/vdev.c:1854 > >=20 > > The complete bt is available here: > > http://heim.ifi.uio.no/staalebk/zfs-panic.txt > >=20 > > As you can see vd=3D0x0, and I think that caused the panic, since it > > tried to follow that pointer: > > return (vd->vdev_state < VDEV_STATE_DEGRADED); > >=20 > > I then tried to remove the file and I got this: > > Solaris: WARNING: metaslab_free_dva(): bad DVA > > 199476166:1296607792756162560 > > Solaris: WARNING: metaslab_free_dva(): bad DVA 4236221:7256850009726709= 760 > > Solaris: WARNING: metaslab_free_dva(): bad DVA > > 935912721:16480078061480073216 > >=20 > > Maybe there should be a test to check if vd was zero, and > > throw an io-error or something, instead of panicing? >=20 > Well, I don't think it should be possible for vdev to be NULL. > But if you still have this panic, can you try this patch: >=20 > http://people.freebsd.org/~pjd/patches/vdev_mirror.c.patch It looks like: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=3D6435666 The work-around is to remove /boot/zfs/zpool.cache and import the pool again. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --9ADF8FXzFeE7X4jE Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkvqjBMACgkQForvXbEpPzTIKQCfSPZ24aVE0byoGcc8QWFu0lSs XWQAniO5HRNLwN6LW7h/iqpelbjkvJgL =zgE8 -----END PGP SIGNATURE----- --9ADF8FXzFeE7X4jE--