From owner-freebsd-fs@FreeBSD.ORG Fri Dec 7 16:22:48 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 99063FD3 for ; Fri, 7 Dec 2012 16:22:48 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay01.ispgateway.de (smtprelay01.ispgateway.de [80.67.29.23]) by mx1.freebsd.org (Postfix) with ESMTP id 4FD488FC12 for ; Fri, 7 Dec 2012 16:22:47 +0000 (UTC) Received: from [87.79.199.75] (helo=fabiankeil.de) by smtprelay01.ispgateway.de with esmtpsa (SSLv3:AES128-SHA:128) (Exim 4.68) (envelope-from ) id 1Th0hF-0005dA-1g; Fri, 07 Dec 2012 17:22:41 +0100 Date: Fri, 7 Dec 2012 17:22:40 +0100 From: Fabian Keil To: Matt Burke Subject: Re: ZFS hang Message-ID: <20121207172240.037306e1@fabiankeil.de> In-Reply-To: <50C1DDE8.9030503@icritical.com> References: <50C1CB34.3000308@icritical.com> <50C1DDE8.9030503@icritical.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/=RU/Fpp7.cOidyj8KH41+zQ"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Dec 2012 16:22:48 -0000 --Sig_/=RU/Fpp7.cOidyj8KH41+zQ Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Matt Burke wrote: > Obviously, the cause of my problems would seem to be a hosed disk. However > the kernel msgbuf shows no complaints from the drive before reboot. >=20 > da8 is a 60GB OCZ Agility 3 SSD (purchased prior to realising just how > unreliable they are). According to the SMART data, it's had just 146GB of > reads and 278GB writes over 3 power cycles with only 3 months power on > time, similar to the others that have failed (~60% failure rate for ours) >=20 > I can understand the drive failing, I just can't understand how it hung t= he > system. I have had a similar thing happen on one of these machines before > (with GENERIC and no dumpdev, so no debugging) with one of these disks on > an Areca HBA. In CURRENT, parts of the cam layer can silently hang under certain circumstances and this can negatively affect various other subsystems including ZFS: http://lists.freebsd.org/pipermail/freebsd-current/2012-October/037413.html I suppose this regression is old enough to have trickled down to the stable branches by now. I'm not saying that this is definitively the problem you are seeing, but I think it would explain the symptoms. > Could there be a problem with ATA devices on SCSI controllers which is > causing failures to be silently dropped? Is ZFS lacking a timeout on IO c= alls? I believe ZFS is designed with the expectation that timeouts are handled by the layers below it, so technically it doesn't "lack" the timeouts for IO calls ... Fabian --Sig_/=RU/Fpp7.cOidyj8KH41+zQ Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlDCF9UACgkQBYqIVf93VJ0orQCfcmsNJbxWyacbww51lJTjO0aH c5wAn0l/fOn7P6yLYvr3Vp6+A4CvzQCB =fGcL -----END PGP SIGNATURE----- --Sig_/=RU/Fpp7.cOidyj8KH41+zQ--