Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 12 May 2010 12:21:56 +0200
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        =?iso-8859-1?Q?St=E5le?= Kristoffersen <staale@kristoffersen.ws>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Bad hardware + zfs = panic
Message-ID:  <20100512102156.GE1703@garage.freebsd.pl>
In-Reply-To: <20100506012217.GA41806@putsch.kolbu.ws>
References:  <20100506012217.GA41806@putsch.kolbu.ws>

next in thread | previous in thread | raw e-mail | index | archive | help

--ZInfyf7laFu/Kiw7
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, May 06, 2010 at 03:22:17AM +0200, St=E5le Kristoffersen wrote:
> I've been debugging a hardware error for the past few days, and I think it
> was the CPU and that it is now fixed. But reading a file that was written=
 to a
> zfs-pool when stuff got corrupted still triggered a panic in ZFS code:
>=20
> Fatal trap 12: page fault while in kernel mode
> cpuid =3D 0; apic id =3D 00
> fault virtual address   =3D 0x28
> fault code              =3D supervisor read data, page not present
> instruction pointer     =3D 0x20:0xffffffff8106f2d3
> stack pointer           =3D 0x28:0xffffff80774914e0
> frame pointer           =3D 0x28:0xffffff8077491510
> code segment            =3D base 0x0, limit 0xfffff, type 0x1b
>                         =3D DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
> current process         =3D 1350 (smbd)
> trap number             =3D 12
> panic: page fault
> cpuid =3D 0
> Uptime: 2m53s
>=20
> The lines in the backtrace that got my attention was:
> #6  0xffffffff80847c73 in calltrap () at
> /usr/src/sys/amd64/amd64/exception.S:224
> #7  0xffffffff8106f2d3 in vdev_is_dead (vd=3D0x0) at
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs=
/vdev.c:1847
> #8  0xffffffff8106f2ed in vdev_readable (vd=3D0x0) at
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs=
/vdev.c:1854
>=20
> The complete bt is available here:
> http://heim.ifi.uio.no/staalebk/zfs-panic.txt
>=20
> As you can see vd=3D0x0, and I think that caused the panic, since it
> tried to follow that pointer:
>  return (vd->vdev_state < VDEV_STATE_DEGRADED);
>=20
> I then tried to remove the file and I got this:
> Solaris: WARNING: metaslab_free_dva(): bad DVA
> 199476166:1296607792756162560
> Solaris: WARNING: metaslab_free_dva(): bad DVA 4236221:7256850009726709760
> Solaris: WARNING: metaslab_free_dva(): bad DVA
> 935912721:16480078061480073216
>=20
> Maybe there should be a test to check if vd was zero, and
> throw an io-error or something, instead of panicing?

Well, I don't think it should be possible for vdev to be NULL.
But if you still have this panic, can you try this patch:

	http://people.freebsd.org/~pjd/patches/vdev_mirror.c.patch

--=20
Pawel Jakub Dawidek                       http://www.wheelsystems.com
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--ZInfyf7laFu/Kiw7
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (FreeBSD)

iEYEARECAAYFAkvqgUQACgkQForvXbEpPzRT3QCggcAC+rgE41ax0FogOXLwdndT
xc0AoMnrOnLO3C/0GciJBMYblVqwGzAn
=nntN
-----END PGP SIGNATURE-----

--ZInfyf7laFu/Kiw7--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100512102156.GE1703>