From owner-freebsd-hackers@FreeBSD.ORG Fri Sep 12 15:59:09 2008 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 21A6F106564A for ; Fri, 12 Sep 2008 15:59:09 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from smtp.sd73.bc.ca (smtp.sd73.bc.ca [142.24.13.140]) by mx1.freebsd.org (Postfix) with ESMTP id EC43A8FC1F for ; Fri, 12 Sep 2008 15:59:08 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from localhost (localhost [127.0.0.1]) by localhost.sd73.bc.ca (Postfix) with ESMTP id 86FD81A013BBC for ; Fri, 12 Sep 2008 08:34:49 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at smtp.sd73.bc.ca Received: from smtp.sd73.bc.ca ([127.0.0.1]) by localhost (smtp.sd73.bc.ca [127.0.0.1]) (amavisd-new, port 10024) with LMTP id fDDk9XTZsv7L for ; Fri, 12 Sep 2008 08:34:02 -0700 (PDT) Received: from coal (unknown [192.168.0.10]) by smtp.sd73.bc.ca (Postfix) with ESMTP id F2E521A01550B for ; Fri, 12 Sep 2008 08:33:28 -0700 (PDT) From: Freddie Cash To: freebsd-hackers@freebsd.org Date: Fri, 12 Sep 2008 08:33:27 -0700 User-Agent: KMail/1.9.9 References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200809120833.28233.fjwcash@gmail.com> Subject: Re: ZFS w/failing drives - any equivalent of Solaris FMA? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Sep 2008 15:59:09 -0000 On September 12, 2008 02:45 am Karl Pielorz wrote: > Recently, a ZFS pool on my FreeBSD box started showing lots of errors > on one drive in a mirrored pair. > > The pool consists of around 14 drives (as 7 mirrored pairs), hung off > of a couple of SuperMicro 8 port SATA controllers (1 drive of each pair > is on each controller). > > One of the drives started picking up a lot of errors (by the end of > things it was returning errors pretty much for any reads/writes issued) > - and taking ages to complete the I/O's. > > However, ZFS kept trying to use the drive - e.g. as I attached another > drive to the remaining 'good' drive in the mirrored pair, ZFS was still > trying to read data off the failed drive (and remaining good one) in > order to complete it's re-silver to the newly attached drive. For the one time I've had a drive fail, and the three times I've replaced drives for larger ones, the process used was: zpool offline zpool replace For one machine, I had to shut it off after the offline, as it didn't have hot-swappable drive bays. For the other machine, it did everything while online and running. IOW, the old device never had a chance to interfere with anything. Same process we've used with hardware RAID setups in the past. > Is there anything similar to this on FreeBSD yet? - i.e. Does/can > anything on the system tell ZFS "This drives experiencing failures" > rather than ZFS just seeing lots of timed out I/O 'errors'? (as appears > to be the case). Beyond the periodic script that checks for things like this, and sends root an e-mail, I haven't seen anything. -- Freddie Cash fjwcash@gmail.com