From owner-freebsd-fs@freebsd.org Sun Nov 20 15:02:37 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7F383C4B23C for ; Sun, 20 Nov 2016 15:02:37 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from mail.in-addr.com (mail.in-addr.com [IPv6:2a01:4f8:191:61e8::2525:2525]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4E06C1628 for ; Sun, 20 Nov 2016 15:02:37 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from gjp by mail.in-addr.com with local (Exim 4.87 (FreeBSD)) (envelope-from ) id 1c8Tdb-0001nD-NY; Sun, 20 Nov 2016 15:02:35 +0000 Date: Sun, 20 Nov 2016 15:02:35 +0000 From: Gary Palmer To: Marek Salwerowicz Cc: "freebsd-fs@freebsd.org" Subject: Re: zpool raidz2 stopped working after failure of one drive Message-ID: <20161120150235.GB99344@in-addr.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: gpalmer@freebsd.org X-SA-Exim-Scanned: No (on mail.in-addr.com); SAEximRunCond expanded to false X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Nov 2016 15:02:37 -0000 On Sat, Nov 19, 2016 at 09:15:54PM +0100, Marek Salwerowicz wrote: > Hi all, > > I run a following server: > > - Supermicro 6047R-E1R36L > - 96 GB RAM > - 1x INTEL CPU E5-2640 v2 @ 2.00GHz > - FreeBSD 10.3-RELEASE-p11 > > Drive for OS: > - HW RAID1: 2x KINGSTON SV300S37A120G > > zpool: > - 18x WD RED 4TB @ raidz2 > - log: mirrored Intel 730 SSD > - cache: single Intel 730 SSD > > > Today after one drive's failure, the whole vdev was removed from the > zpool (basically the zpool was down, zpool / zfs commands were not > responding): > [snip] > There was no other option than hard-rebooting server. > SMART value "Raw_Read_Error_Rate " for the failed drive has increased 0 > -> 1. I am about to replace it - it still has warranty. > > I have now disabled the failing drive in zpool and it works fine (of > course, in DEGRADED state until I replace the drive) > > However, I am concerned by the fact that one drive's failure has blocked > completely zpool. > Is it normal normal behaviour for zpools ? What is the setting in zpool get failmode By default it is "wait", which I suspect is what caused your issues. See the man page for zpool for more. > Also, is there already auto hot-spare in ZFS? If I had a hot spare drive > in my zpool, would it be automatically replaced? zfsd in 11.0 and later is the current path to hot spare management in FreeBSD. FreeBSD 10.x does not have the ability to automatically use hot spares to replace failing drives. Regards Gary