Date: Wed, 12 May 2010 12:21:56 +0200 From: Pawel Jakub Dawidek <pjd@FreeBSD.org> To: =?iso-8859-1?Q?St=E5le?= Kristoffersen <staale@kristoffersen.ws> Cc: freebsd-fs@freebsd.org Subject: Re: Bad hardware + zfs = panic Message-ID: <20100512102156.GE1703@garage.freebsd.pl> In-Reply-To: <20100506012217.GA41806@putsch.kolbu.ws> References: <20100506012217.GA41806@putsch.kolbu.ws>
next in thread | previous in thread | raw e-mail | index | archive | help
--ZInfyf7laFu/Kiw7 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, May 06, 2010 at 03:22:17AM +0200, St=E5le Kristoffersen wrote: > I've been debugging a hardware error for the past few days, and I think it > was the CPU and that it is now fixed. But reading a file that was written= to a > zfs-pool when stuff got corrupted still triggered a panic in ZFS code: >=20 > Fatal trap 12: page fault while in kernel mode > cpuid =3D 0; apic id =3D 00 > fault virtual address =3D 0x28 > fault code =3D supervisor read data, page not present > instruction pointer =3D 0x20:0xffffffff8106f2d3 > stack pointer =3D 0x28:0xffffff80774914e0 > frame pointer =3D 0x28:0xffffff8077491510 > code segment =3D base 0x0, limit 0xfffff, type 0x1b > =3D DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags =3D interrupt enabled, resume, IOPL =3D 0 > current process =3D 1350 (smbd) > trap number =3D 12 > panic: page fault > cpuid =3D 0 > Uptime: 2m53s >=20 > The lines in the backtrace that got my attention was: > #6 0xffffffff80847c73 in calltrap () at > /usr/src/sys/amd64/amd64/exception.S:224 > #7 0xffffffff8106f2d3 in vdev_is_dead (vd=3D0x0) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs= /vdev.c:1847 > #8 0xffffffff8106f2ed in vdev_readable (vd=3D0x0) at > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs= /vdev.c:1854 >=20 > The complete bt is available here: > http://heim.ifi.uio.no/staalebk/zfs-panic.txt >=20 > As you can see vd=3D0x0, and I think that caused the panic, since it > tried to follow that pointer: > return (vd->vdev_state < VDEV_STATE_DEGRADED); >=20 > I then tried to remove the file and I got this: > Solaris: WARNING: metaslab_free_dva(): bad DVA > 199476166:1296607792756162560 > Solaris: WARNING: metaslab_free_dva(): bad DVA 4236221:7256850009726709760 > Solaris: WARNING: metaslab_free_dva(): bad DVA > 935912721:16480078061480073216 >=20 > Maybe there should be a test to check if vd was zero, and > throw an io-error or something, instead of panicing? Well, I don't think it should be possible for vdev to be NULL. But if you still have this panic, can you try this patch: http://people.freebsd.org/~pjd/patches/vdev_mirror.c.patch --=20 Pawel Jakub Dawidek http://www.wheelsystems.com pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --ZInfyf7laFu/Kiw7 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkvqgUQACgkQForvXbEpPzRT3QCggcAC+rgE41ax0FogOXLwdndT xc0AoMnrOnLO3C/0GciJBMYblVqwGzAn =nntN -----END PGP SIGNATURE----- --ZInfyf7laFu/Kiw7--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100512102156.GE1703>