From owner-freebsd-questions@freebsd.org Tue Apr 19 19:51:29 2016 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DEC3DB0A697 for ; Tue, 19 Apr 2016 19:51:29 +0000 (UTC) (envelope-from freebsd-questions@m.gmane.org) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A66851820 for ; Tue, 19 Apr 2016 19:51:29 +0000 (UTC) (envelope-from freebsd-questions@m.gmane.org) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1asbgA-0002e3-WB for freebsd-questions@freebsd.org; Tue, 19 Apr 2016 21:51:23 +0200 Received: from pool-72-66-1-32.washdc.fios.verizon.net ([72.66.1.32]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 19 Apr 2016 21:51:22 +0200 Received: from nightrecon by pool-72-66-1-32.washdc.fios.verizon.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 19 Apr 2016 21:51:22 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-questions@freebsd.org From: Michael Powell Subject: Re: Raid 1+0 Date: Tue, 19 Apr 2016 15:52:22 -0400 Lines: 47 Message-ID: References: <571533F4.8040406@bananmonarki.se> <57153E6B.6090200@gmail.com> <20160418210257.GB86917@neutralgood.org> <64031.128.135.52.6.1461017122.squirrel@cosmo.uchicago.edu> <20160419153824.7b679129f82a3cd0b18b9740@sohara.org> Reply-To: nightrecon@hotmail.com Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: pool-72-66-1-32.washdc.fios.verizon.net X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Apr 2016 19:51:30 -0000 Steve O'Hara-Smith wrote: > On Mon, 18 Apr 2016 17:05:22 -0500 (CDT) > "Valeri Galtsev" wrote: > >> Not correct. First of all, in most of the cases, failure of each of the >> drives are independent events > > If only that were so. When the drives are as near identical as > manufacturing can make them and have had very similar histories they can > be expected to have very similar wear and be similarly close to failure at > all times, which makes it likely that the load imposed by one failing will > push another over. > And the more of them you place in the same physical enclosure, the more vibration patterns and platter skew from either perfectly horizontal or perfectly vertical mounting generate complex interference patterns. The vibrational characteristics of the enclosure matter. In airframe superstructure testing vibration sensors (think seismology) are scattered throughout, then they use something that resembles a gun or an air hammer to bang on a point in order to map out how the resulting vibration will flow through the airframe. (Not my field of endeavor, something I learned from my dad). I'm certainly not qualified to debate probability theory. My experience is anecdotal at best, but many sysadmins have witnessed various forms of drive failure(s) in raid arrays. Most have noticed over the years that it seems to occur most often when all drives come from the same manufacturing batch run and lot number. After enough of these a sysadmin will respond by shuffling the drives so they are not all from the same shipment, as well as when one does fail, get it swapped out ASAP before another goes and you lose the whole array. Another pattern is simple age. I've seen drives that had run for so many years that all assumptions are they are OK. Power them down, and poof - just like that they don't come back. I've had arrays where one drive failed and when powered down some of the others would not come back up after power up. The answer to this is hot spare plus hot swap. Anecdotal experience is no substitute for rigorous scientific proofs. Most sysadmins are not concerned with such, but rather keeping servers running and data flowing, almost to the point of superstition. Whatever works - use it. -Mike