Date: Tue, 19 Apr 2016 15:52:22 -0400 From: Michael Powell <nightrecon@hotmail.com> To: freebsd-questions@freebsd.org Subject: Re: Raid 1+0 Message-ID: <nf627k$a8g$1@ger.gmane.org> References: <571533F4.8040406@bananmonarki.se> <57153E6B.6090200@gmail.com> <20160418210257.GB86917@neutralgood.org> <64031.128.135.52.6.1461017122.squirrel@cosmo.uchicago.edu> <20160419153824.7b679129f82a3cd0b18b9740@sohara.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Steve O'Hara-Smith wrote: > On Mon, 18 Apr 2016 17:05:22 -0500 (CDT) > "Valeri Galtsev" <galtsev@kicp.uchicago.edu> wrote: > >> Not correct. First of all, in most of the cases, failure of each of the >> drives are independent events > > If only that were so. When the drives are as near identical as > manufacturing can make them and have had very similar histories they can > be expected to have very similar wear and be similarly close to failure at > all times, which makes it likely that the load imposed by one failing will > push another over. > And the more of them you place in the same physical enclosure, the more vibration patterns and platter skew from either perfectly horizontal or perfectly vertical mounting generate complex interference patterns. The vibrational characteristics of the enclosure matter. In airframe superstructure testing vibration sensors (think seismology) are scattered throughout, then they use something that resembles a gun or an air hammer to bang on a point in order to map out how the resulting vibration will flow through the airframe. (Not my field of endeavor, something I learned from my dad). I'm certainly not qualified to debate probability theory. My experience is anecdotal at best, but many sysadmins have witnessed various forms of drive failure(s) in raid arrays. Most have noticed over the years that it seems to occur most often when all drives come from the same manufacturing batch run and lot number. After enough of these a sysadmin will respond by shuffling the drives so they are not all from the same shipment, as well as when one does fail, get it swapped out ASAP before another goes and you lose the whole array. Another pattern is simple age. I've seen drives that had run for so many years that all assumptions are they are OK. Power them down, and poof - just like that they don't come back. I've had arrays where one drive failed and when powered down some of the others would not come back up after power up. The answer to this is hot spare plus hot swap. Anecdotal experience is no substitute for rigorous scientific proofs. Most sysadmins are not concerned with such, but rather keeping servers running and data flowing, almost to the point of superstition. Whatever works - use it. -Mike
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?nf627k$a8g$1>