Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 7 Dec 2012 17:22:40 +0100
From:      Fabian Keil <freebsd-listen@fabiankeil.de>
To:        Matt Burke <mattblists@icritical.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: ZFS hang
Message-ID:  <20121207172240.037306e1@fabiankeil.de>
In-Reply-To: <50C1DDE8.9030503@icritical.com>
References:  <50C1CB34.3000308@icritical.com> <50C1DDE8.9030503@icritical.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/=RU/Fpp7.cOidyj8KH41+zQ
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

Matt Burke <mattblists@icritical.com> wrote:

> Obviously, the cause of my problems would seem to be a hosed disk. However
> the kernel msgbuf shows no complaints from the drive before reboot.
>=20
> da8 is a 60GB OCZ Agility 3 SSD (purchased prior to realising just how
> unreliable they are). According to the SMART data, it's had just 146GB of
> reads and 278GB writes over 3 power cycles with only 3 months power on
> time, similar to the others that have failed (~60% failure rate for ours)
>=20
> I can understand the drive failing, I just can't understand how it hung t=
he
> system. I have had a similar thing happen on one of these machines before
> (with GENERIC and no dumpdev, so no debugging) with one of these disks on
> an Areca HBA.

In CURRENT, parts of the cam layer can silently hang under certain
circumstances and this can negatively affect various other subsystems
including ZFS:
http://lists.freebsd.org/pipermail/freebsd-current/2012-October/037413.html

I suppose this regression is old enough to have trickled down
to the stable branches by now.

I'm not saying that this is definitively the problem you are
seeing, but I think it would explain the symptoms.

> Could there be a problem with ATA devices on SCSI controllers which is
> causing failures to be silently dropped? Is ZFS lacking a timeout on IO c=
alls?

I believe ZFS is designed with the expectation that timeouts are
handled by the layers below it, so technically it doesn't "lack"
the timeouts for IO calls ...

Fabian

--Sig_/=RU/Fpp7.cOidyj8KH41+zQ
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAlDCF9UACgkQBYqIVf93VJ0orQCfcmsNJbxWyacbww51lJTjO0aH
c5wAn0l/fOn7P6yLYvr3Vp6+A4CvzQCB
=fGcL
-----END PGP SIGNATURE-----

--Sig_/=RU/Fpp7.cOidyj8KH41+zQ--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121207172240.037306e1>