Date: Fri, 22 Aug 2014 04:40:06 -0500 From: Scott Bennett <bennett@sdf.org> To: freebsd-questions@freebsd.org, paul@kraus-haus.org Cc: freebsd@qeng-ho.org, Trond.Endrestol@fagskolen.gjovik.no Subject: Re: gvinum raid5 vs. ZFS raidz Message-ID: <201408220940.s7M9e6pZ008296@sdf.org> In-Reply-To: <5B99AAB4-C8CB-45A9-A6F0-1F8B08221917@kraus-haus.org> References: <201408020621.s726LsiA024208@sdf.org> <alpine.BSF.2.11.1408020356250.1128@wonkity.com> <53DCDBE8.8060704@qeng-ho.org> <201408060556.s765uKJA026937@sdf.org> <53E1FF5F.1050500@qeng-ho.org> <201408070831.s778VhJc015365@sdf.org> <alpine.BSF.2.11.1408071034510.64214@mail.fig.ol.no> <201408070936.s779akMv017524@sdf.org> <alpine.BSF.2.11.1408071226020.64214@mail.fig.ol.no> <201408071106.s77B6JCI005742@sdf.org> <5B99AAB4-C8CB-45A9-A6F0-1F8B08221917@kraus-haus.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Paul Kraus <paul@kraus-haus.org> wrote: > On Aug 7, 2014, at 7:06, Scott Bennett <bennett@sdf.org> wrote: > > > Even just as parity bits, those would amount to only one bit per > > eight bytes, which seems inadequate. OTOH, the 520 bytes thing is > > tickling something in my memory that I can't quite seem to recover, and > > I don't know (or can't remember) what else those eight bytes might be > > used for. In any case, at the time I spoke with the guy at Seagate/Samsung, > > I was unaware of the server grade vs. non-server grade distinction, so I > > didn't know to ask him anything about whether silent errors should be > > accepted as "normal" for the server grade of disks. > > Take a look at the manufacturer data sheets for this drives. All of the ones that I have looked at over the past ten years have included the ?uncorrectable error rate? and it is generally 1 in 10e-14 for ?consumer grade drives? and 1 in 1e-15 for ?enterprise grade drives?. That right there shows the order of magnitude difference in this error rate between consumer and enterprise drives. I'll assume you meant the reciprocals of those ratios or possibly even 1/10 of the reciprocals. ;-) What I'm seeing here is ~2 KB of errors out of ~1.1TB, which is an error rate (in bytes, not bits) of ~1.82e+09, and the majority of the erroneous bytes I looked at had multibit errors. I consider that to be a huge change in the actual device error rates, specs be damned. While I was out of town, I came across a trade magazine article that said that as the areal density of bits approaches the theoretical limit for the recording technology currently in production, the error rate climbs ever more steeply, and that the drives larger than 1 TB are now making that effect easily demonstrable. :-( The article went on to describe superficially a new recording technology due to appear on the mass market in 2015 that will allow much higher bit densities, while drastically improving the error rate (at least until densities eventually close in on that technology's limit). So it may turn out that next year consumers will begin to move past the hump in error rates and will find that hardware RAID will have become acceptably safe once again. The description of the new recording technology looked like a really spiffed up version of the magneto-optical disks of the 1990s. In the meantime, though, the current crops of large-capacity disks apparently require software solutions like ZFS to preserve data integrity. > > The reason no one even discussed it prior to the appearance of 1TB drives is that over the life of a less than 1TB drive you are statistically almost assured of NOT running into it. It was still there, but no one wrote/read enough data over the life of the drive to hit it. That sounds reasonable, but it doesn't account for the error rates I'm seeing. > > On the other hand, I am willing to bet that many of the ?random? systems crashes (and Windows BSOD) were caused by this issue. A hard disk returned a single bit error in a bad place and the system crashed. Quite possibly so, I'd say. > > Note that all disk drives include some amount of error checking, even as far back as the 10MB MFM drives of the 1980?s. Anyone remember having to manually manage the ?Bad block list? ? Those were blocks that were so bad that the error correction could not fix them. But, as far as I can tell, the uncorrectable errors have always been with us, we just did not not see them. > I remember hearing about it, but I was safely tucked away on minicomputers and a mainframe at that point. As I wrote before, an unrecoverable single-bit error resulted in a bad sector reassignment by a human or, as was the policy where I worked at the time, replacement of the disk pack by the vendor. The smallest drive on any pee cee that I used was 20 MB with an average seek time of 68 ms, IIRC. Even that 4 MHz 8088 had to cool its heels for long stretches of time with that drive. Fortunately for me, I didn't have to use that machine for work, but rather for an applied microclimatology course I was taking at the time. Scott Bennett, Comm. ASMELG, CFIAG ********************************************************************** * Internet: bennett at sdf.org *xor* bennett at freeshell.org * *--------------------------------------------------------------------* * "A well regulated and disciplined militia, is at all times a good * * objection to the introduction of that bane of all free governments * * -- a standing army." * * -- Gov. John Hancock, New York Journal, 28 January 1790 * **********************************************************************
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201408220940.s7M9e6pZ008296>