Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 28 Aug 2014 01:36:05 -0500
From:      Scott Bennett <bennett@sdf.org>
To:        paul@kraus-haus.org
Cc:        freebsd-questions@freebsd.org, freebsd@qeng-ho.org, Trond.Endrestol@fagskolen.gjovik.no
Subject:   Re: gvinum raid5 vs. ZFS raidz
Message-ID:  <201408280636.s7S6a5OZ022667@sdf.org>
In-Reply-To: <9588077E-1198-45AF-8C4A-606C46C6E4F8@kraus-haus.org>
References:  <201408020621.s726LsiA024208@sdf.org> <alpine.BSF.2.11.1408020356250.1128@wonkity.com> <53DCDBE8.8060704@qeng-ho.org> <201408060556.s765uKJA026937@sdf.org> <53E1FF5F.1050500@qeng-ho.org> <201408070831.s778VhJc015365@sdf.org> <alpine.BSF.2.11.1408071034510.64214@mail.fig.ol.no> <201408070936.s779akMv017524@sdf.org> <alpine.BSF.2.11.1408071226020.64214@mail.fig.ol.no> <201408071106.s77B6JCI005742@sdf.org> <5B99AAB4-C8CB-45A9-A6F0-1F8B08221917@kraus-haus.org> <201408220940.s7M9e6pZ008296@sdf.org> <7971D6CA-AEE3-447D-8D09-8AC0B9CC6DBE@kraus-haus.org> <201408260641.s7Q6feBc004970@sdf.org> <9588077E-1198-45AF-8C4A-606C46C6E4F8@kraus-haus.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Paul Kraus <paul@kraus-haus.org> wrote:
> On Aug 26, 2014, at 2:41, Scott Bennett <bennett@sdf.org> wrote:
> > Paul Kraus <paul@kraus-haus.org> wrote:
> >> On Aug 22, 2014, at 5:40, Scott Bennett <bennett@sdf.org> wrote:
> >>> What I'm seeing here is ~2 KB of errors out
> >>> of ~1.1TB, which is an error rate (in bytes, not bits) of ~1.82e+09, and the
     As I caught and corrected before, the above should have said, "~1.82e-09".

> >>> majority of the erroneous bytes I looked at had multibit errors.  I consider
> >>> that to be a huge change in the actual device error rates, specs be damned.
> >> 
> >> That seems like a very high error rate. Is the drive reporting those errors or are they getting past the drive?s error correction and showing up as checksum errors in ZFS ? A drive that is throwing that many errors is clearly defective or dying.
> > 
> >     I'm not using ZFS yet.  Once I get a couple more 2 TB drives, I'll give
> > it a shot.
> >     The numbers are from running direct comparisons between the source file
> > and the copy of it using cmp(1).  In one case, I ran the cmp twice and got
> > identical results, which I interpret as an indication that the errors are
> > occurring during the writes to the target disk during the copying.
>
> Wow. That implies you are hitting a drive with a very high uncorrectable error rate since the drive did not report any errors and the data is corrupt. I have yet to run into one of those.

     How would an uncorrectable error be detected by the drive without any
parity checking or hardware-implemented write-with-verify?
     Are you using any drives larger than 1 TB?  If so, try copying a 1.1 TB
file to one of them, and then trying comparing the copy against the original.
Out of the three drives I could test that way, I got that kind of result on
two every time I tried it.  One of the two was a new Samsung (i.e., a
Seagate), and the other was a refurbished Seagate supplied as a replacement
under warranty.  The third got a clean copy the first time and two bytes with
single-bit errors on the second try.  That one was also a refurbished Seagate
provided under warranty.


                                  Scott Bennett, Comm. ASMELG, CFIAG
**********************************************************************
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
*--------------------------------------------------------------------*
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."                                               *
*    -- Gov. John Hancock, New York Journal, 28 January 1790         *
**********************************************************************



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201408280636.s7S6a5OZ022667>