From owner-freebsd-questions@freebsd.org  Tue Apr 19 19:51:29 2016
Return-Path: <owner-freebsd-questions@freebsd.org>
Delivered-To: freebsd-questions@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id DEC3DB0A697
 for <freebsd-questions@mailman.ysv.freebsd.org>;
 Tue, 19 Apr 2016 19:51:29 +0000 (UTC)
 (envelope-from freebsd-questions@m.gmane.org)
Received: from plane.gmane.org (plane.gmane.org [80.91.229.3])
 (using TLSv1 with cipher AES256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id A66851820
 for <freebsd-questions@freebsd.org>; Tue, 19 Apr 2016 19:51:29 +0000 (UTC)
 (envelope-from freebsd-questions@m.gmane.org)
Received: from list by plane.gmane.org with local (Exim 4.69)
 (envelope-from <freebsd-questions@m.gmane.org>) id 1asbgA-0002e3-WB
 for freebsd-questions@freebsd.org; Tue, 19 Apr 2016 21:51:23 +0200
Received: from pool-72-66-1-32.washdc.fios.verizon.net ([72.66.1.32])
 by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
 id 1AlnuQ-0007hv-00
 for <freebsd-questions@freebsd.org>; Tue, 19 Apr 2016 21:51:22 +0200
Received: from nightrecon by pool-72-66-1-32.washdc.fios.verizon.net with
 local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00
 for <freebsd-questions@freebsd.org>; Tue, 19 Apr 2016 21:51:22 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-questions@freebsd.org
From: Michael Powell <nightrecon@hotmail.com>
Subject: Re: Raid 1+0
Date: Tue, 19 Apr 2016 15:52:22 -0400
Lines: 47
Message-ID: <nf627k$a8g$1@ger.gmane.org>
References: <571533F4.8040406@bananmonarki.se> <57153E6B.6090200@gmail.com>
 <20160418210257.GB86917@neutralgood.org>
 <64031.128.135.52.6.1461017122.squirrel@cosmo.uchicago.edu>
 <20160419153824.7b679129f82a3cd0b18b9740@sohara.org>
Reply-To: nightrecon@hotmail.com
Mime-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7Bit
X-Complaints-To: usenet@ger.gmane.org
X-Gmane-NNTP-Posting-Host: pool-72-66-1-32.washdc.fios.verizon.net
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions/>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
 <mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Apr 2016 19:51:30 -0000

Steve O'Hara-Smith wrote:

> On Mon, 18 Apr 2016 17:05:22 -0500 (CDT)
> "Valeri Galtsev" <galtsev@kicp.uchicago.edu> wrote:
> 
>> Not correct. First of all, in most of the cases, failure of each of the
>> drives are independent events
> 
> If only that were so. When the drives are as near identical as
> manufacturing can make them and have had very similar histories they can
> be expected to have very similar wear and be similarly close to failure at
> all times, which makes it likely that the load imposed by one failing will
> push another over.
> 

And the more of them you place in the same physical enclosure, the more 
vibration patterns and platter skew from either perfectly horizontal or 
perfectly vertical mounting generate complex interference patterns. The 
vibrational characteristics of the enclosure matter. In airframe 
superstructure testing vibration sensors (think seismology) are scattered 
throughout, then they use something that resembles a gun or an air hammer to 
bang on a point in order to map out how the resulting vibration will flow 
through the airframe. (Not my field of endeavor, something I learned from my 
dad).

I'm certainly not qualified to debate probability theory. My experience is 
anecdotal at best, but many sysadmins have witnessed various forms of drive 
failure(s) in raid arrays. Most have noticed over the years that it seems to 
occur most often when all drives come from the same manufacturing batch run 
and lot number. After enough of these a sysadmin will respond by shuffling 
the drives so they are not all from the same shipment, as well as when one 
does fail, get it swapped out ASAP before another goes and you lose the 
whole array.

Another pattern is simple age. I've seen drives that had run for so many 
years that all assumptions are they are OK. Power them down, and poof - just 
like that they don't come back. I've had arrays where one drive failed and 
when powered down some of the others would not come back up after power up. 
The answer to this is hot spare plus hot swap.

Anecdotal experience is no substitute for rigorous scientific proofs. Most 
sysadmins are not concerned with such, but rather keeping servers running 
and data flowing, almost to the point of superstition. Whatever works - use 
it.

-Mike