From owner-freebsd-fs@freebsd.org  Sun Nov 20 15:02:37 2016
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7F383C4B23C
 for <freebsd-fs@mailman.ysv.freebsd.org>; Sun, 20 Nov 2016 15:02:37 +0000 (UTC)
 (envelope-from gpalmer@freebsd.org)
Received: from mail.in-addr.com (mail.in-addr.com
 [IPv6:2a01:4f8:191:61e8::2525:2525])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 4E06C1628
 for <freebsd-fs@freebsd.org>; Sun, 20 Nov 2016 15:02:37 +0000 (UTC)
 (envelope-from gpalmer@freebsd.org)
Received: from gjp by mail.in-addr.com with local (Exim 4.87 (FreeBSD))
 (envelope-from <gpalmer@freebsd.org>)
 id 1c8Tdb-0001nD-NY; Sun, 20 Nov 2016 15:02:35 +0000
Date: Sun, 20 Nov 2016 15:02:35 +0000
From: Gary Palmer <gpalmer@freebsd.org>
To: Marek Salwerowicz <marek.salwerowicz@misal.pl>
Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject: Re: zpool raidz2 stopped working after failure of one drive
Message-ID: <20161120150235.GB99344@in-addr.com>
References: <aa638ae8-4664-c45f-25af-f9e9337498de@misal.pl>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <aa638ae8-4664-c45f-25af-f9e9337498de@misal.pl>
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: gpalmer@freebsd.org
X-SA-Exim-Scanned: No (on mail.in-addr.com); SAEximRunCond expanded to false
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 20 Nov 2016 15:02:37 -0000

On Sat, Nov 19, 2016 at 09:15:54PM +0100, Marek Salwerowicz wrote:
> Hi all,
> 
> I run a following server:
> 
> - Supermicro 6047R-E1R36L
> - 96 GB RAM
> - 1x INTEL CPU E5-2640 v2 @ 2.00GHz
> - FreeBSD 10.3-RELEASE-p11
> 
> Drive for OS:
> - HW RAID1: 2x KINGSTON SV300S37A120G
> 
> zpool:
> - 18x WD RED 4TB @ raidz2
> - log: mirrored Intel 730 SSD
> - cache: single Intel 730 SSD
> 
> 
> Today after one drive's failure, the whole vdev was removed from the 
> zpool (basically the zpool was down, zpool / zfs commands were not 
> responding):
> 

[snip]

> There was no other option than hard-rebooting server.
> SMART value "Raw_Read_Error_Rate "  for the failed drive has increased 0 
> -> 1. I am about to replace it - it still has warranty.
> 
> I have now disabled the failing drive in zpool and it works fine (of 
> course, in DEGRADED state until I replace the drive)
> 
> However, I am concerned by the fact that one drive's failure has blocked 
> completely zpool.
> Is it normal normal behaviour for zpools ?


What is the setting in

zpool get failmode <poolname>

By default it is "wait", which I suspect is what caused your issues.
See the man page for zpool for more.

> Also, is there already auto hot-spare in ZFS? If I had a hot spare drive 
> in my zpool, would it be automatically replaced?

zfsd in 11.0 and later is the current path to hot spare management
in FreeBSD.  FreeBSD 10.x does not have the ability to automatically use
hot spares to replace failing drives.

Regards

Gary