From owner-freebsd-hackers@FreeBSD.ORG  Fri Sep 12 13:21:04 2008
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A5EAE1065670
	for <freebsd-hackers@freebsd.org>; Fri, 12 Sep 2008 13:21:04 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from QMTA04.emeryville.ca.mail.comcast.net
	(qmta04.emeryville.ca.mail.comcast.net [76.96.30.40])
	by mx1.freebsd.org (Postfix) with ESMTP id 851828FC14
	for <freebsd-hackers@freebsd.org>; Fri, 12 Sep 2008 13:21:04 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from OMTA07.emeryville.ca.mail.comcast.net ([76.96.30.59])
	by QMTA04.emeryville.ca.mail.comcast.net with comcast
	id Dp7K1a00C1GXsucA4pM4cT; Fri, 12 Sep 2008 13:21:04 +0000
Received: from koitsu.dyndns.org ([67.180.253.227])
	by OMTA07.emeryville.ca.mail.comcast.net with comcast
	id DpM21a00U4v8bD78TpM36v; Fri, 12 Sep 2008 13:21:03 +0000
X-Authority-Analysis: v=1.0 c=1 a=FEcCtSrf6_wA:10 a=SSZ9KyxJ8eYA:10
	a=QycZ5dHgAAAA:8 a=MzONc8gBtlYMkQ8CHR0A:9
	a=2IKYoY-TbCJQfB7gm3YRLnJDFmcA:4
	a=EoioJ0NPDVgA:10 a=LY0hPdMaydYA:10
Received: by icarus.home.lan (Postfix, from userid 1000)
	id D50F917B81A; Fri, 12 Sep 2008 06:21:02 -0700 (PDT)
Date: Fri, 12 Sep 2008 06:21:02 -0700
From: Jeremy Chadwick <koitsu@FreeBSD.org>
To: Karl Pielorz <kpielorz_lst@tdx.co.uk>
Message-ID: <20080912132102.GB56923@icarus.home.lan>
References: <C984A6E7B1C6657CD8C4F79E@Slim64.dmpriest.net.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <C984A6E7B1C6657CD8C4F79E@Slim64.dmpriest.net.uk>
User-Agent: Mutt/1.5.18 (2008-05-17)
Cc: freebsd-hackers@freebsd.org
Subject: Re: ZFS w/failing drives - any equivalent of Solaris FMA?
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Sep 2008 13:21:04 -0000

On Fri, Sep 12, 2008 at 10:45:24AM +0100, Karl Pielorz wrote:
> Recently, a ZFS pool on my FreeBSD box started showing lots of errors on  
> one drive in a mirrored pair.
>
> The pool consists of around 14 drives (as 7 mirrored pairs), hung off of 
> a couple of SuperMicro 8 port SATA controllers (1 drive of each pair is 
> on each controller).
>
> One of the drives started picking up a lot of errors (by the end of 
> things it was returning errors pretty much for any reads/writes issued) - 
> and taking ages to complete the I/O's.
>
> However, ZFS kept trying to use the drive - e.g. as I attached another  
> drive to the remaining 'good' drive in the mirrored pair, ZFS was still  
> trying to read data off the failed drive (and remaining good one) in 
> order to complete it's re-silver to the newly attached drive.
>
> Having posted on the Open Solaris ZFS list - it appears, under Solaris  
> there's an 'FMA Engine' which communicates drive failures and the like to 
> ZFS - advising ZFS when a drive should be marked as 'failed'.
>
> Is there anything similar to this on FreeBSD yet? - i.e. Does/can 
> anything on the system tell ZFS "This drives experiencing failures" 
> rather than ZFS just seeing lots of timed out I/O 'errors'? (as appears 
> to be the case).

As far as I know, there is no such "standard" mechanism in FreeBSD.  If
the drive falls off the bus entirely (e.g. detached), I would hope ZFS
would notice that.  I can imagine it (might) also depend on if the disk
subsystem you're using is utilising CAM or not (e.g. disks should be daX
not adX); Scott Long might know if something like this is implemented in
CAM.  I'm fairly certain nothing like this is implemented in ata(4).

Ideally, it would be the job of the controller and controller driver to
announce to underlying I/O operations fail/success.  Do you agree?

I hope this "FMA Engine" on Solaris only *tells* underlying pieces of
I/O errors, rather than acting on them (e.g. automatically yanking the
disk off the bus for you).  I'm in no way shunning Solaris, I'm simply
saying such a mechanism could be as risky/deadly as it could be useful.

> In the end, the failing drive was timing out literally every I/O - I did  
> recover the situation by detaching it from the pool (which hung the 
> machine - probably caused by ZFS having to update the meta-data on all 
> drives, including the failed one). A reboot bought the pool back, minus 
> the 'failed' drive, so enough of the 'detach' must have completed.
>
> The newly attached drive completed the re-silver in half an hour (as  
> opposed to an estimated 755 hours and climbing with the other drive still 
> in the pool, limping along).

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |