From owner-freebsd-fs@FreeBSD.ORG  Mon Jun 22 15:50:55 2015
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@nevdull.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9DE6E1DB
 for <freebsd-fs@nevdull.freebsd.org>; Mon, 22 Jun 2015 15:50:55 +0000 (UTC)
 (envelope-from wjw@digiware.nl)
Received: from hub.freebsd.org (hub.freebsd.org
 [IPv6:2001:1900:2254:206c::16:88])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "hub.freebsd.org", Issuer "hub.freebsd.org" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 7F2A4625
 for <freebsd-fs@FreeBSD.ORG>; Mon, 22 Jun 2015 15:50:55 +0000 (UTC)
 (envelope-from wjw@digiware.nl)
Received: by hub.freebsd.org (Postfix)
 id 74C031DA; Mon, 22 Jun 2015 15:50:55 +0000 (UTC)
Delivered-To: fs@nevdull.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 73E0F1D9
 for <fs@nevdull.freebsd.org>; Mon, 22 Jun 2015 15:50:55 +0000 (UTC)
 (envelope-from wjw@digiware.nl)
Received: from smtp.digiware.nl (smtp.digiware.nl [31.223.170.169])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3512A623
 for <fs@freebsd.org>; Mon, 22 Jun 2015 15:50:54 +0000 (UTC)
 (envelope-from wjw@digiware.nl)
Received: from rack1.digiware.nl (unknown [127.0.0.1])
 by smtp.digiware.nl (Postfix) with ESMTP id 11BA916A401;
 Mon, 22 Jun 2015 17:50:51 +0200 (CEST)
X-Virus-Scanned: amavisd-new at digiware.nl
Received: from smtp.digiware.nl ([127.0.0.1])
 by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id WXS0Lf198po6; Mon, 22 Jun 2015 17:50:23 +0200 (CEST)
Received: from [192.168.101.176] (vpn.ecoracks.nl [31.223.170.173])
 by smtp.digiware.nl (Postfix) with ESMTPA id E44FB16A402;
 Mon, 22 Jun 2015 17:41:17 +0200 (CEST)
Message-ID: <55882C9F.8020507@digiware.nl>
Date: Mon, 22 Jun 2015 17:41:19 +0200
From: Willem Jan Withagen <wjw@digiware.nl>
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64;
 rv:31.0) Gecko/20100101 Thunderbird/31.7.0
MIME-Version: 1.0
To: Michelle Sullivan <michelle@sorbs.net>, 
 Quartz <quartz@sneakertech.com>
CC: fs@freebsd.org
Subject: Re: This diskfailure should not panic a system, but just disconnect
 disk from ZFS
References: <5585767B.4000206@digiware.nl> <5587236A.6020404@sneakertech.com>
 <558769B5.601@sorbs.net>
In-Reply-To: <558769B5.601@sorbs.net>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 22 Jun 2015 15:50:55 -0000

On 22/06/2015 03:49, Michelle Sullivan wrote:
> Quartz wrote:
>> Also:
>>
>>> And thus I'd would have expected that ZFS would disconnect /dev/da0 and
>>> then switch to DEGRADED state and continue, letting the operator fix the
>>> broken disk.
>>
>>> Next question to answer is why this WD RED on:
>>
>>> got hung, and nothing for this shows in SMART....
>>
>> You have a raidz2, which means THREE disks need to go down before the
>> pool is unwritable. The problem is most likely your controller or
>> power supply, not your disks.
>>
> Never make such assumptions...
> 
> I have worked in a professional environment where 9 of 12 disks failed
> within 24 hours of each other....  They were all supposed to be from
> different batches but due to an error they came from the same batch and
> the environment was so tightly controlled and the work-load was so
> similar that MTBF was almost identical on all 11 disks in the array...
> the only disk that lasted more than 2 weeks over the failure was the
> hotspare...!
> 

Scary (non)-statistics....
Theories are always nice, but this sort of experiences make your hair go
grey overnight.

--WjW