FreeBSD Mail Archives

Date:      Sat, 25 Oct 2014 04:02:16 +0100
From:      Steven Hartland <smh@freebsd.org>
To:        freebsd-fs@freebsd.org
Subject:   Re: ZFS errors on the array but not the disk.
Message-ID:  <544B12B8.8060302@freebsd.org>
In-Reply-To: <CACpH0MfL1J8fbP%2BMkdop8C=iTJmvscDv16mVynSqXC0uspdLfw@mail.gmail.com>
References:  <CACpH0MeAvs6rzWUo3uF8uTygPk6qnZE8W=3-zsiTAKdvm4N01w@mail.gmail.com> <CAOtMX2g5GYZqYgWNmD_K_TSdTc8oxvvpe4463ni=sEX_b7_Erw@mail.gmail.com> <CACpH0MfL1J8fbP%2BMkdop8C=iTJmvscDv16mVynSqXC0uspdLfw@mail.gmail.com>


There was an issue which would cause resilver restarts fixed by *265253* 
<https://svnweb.freebsd.org/base?view=revision&revision=265253>; which 
was MFC'ed to stable/10 by *271683* 
<https://svnweb.freebsd.org/base?view=revision&revision=271683>so you'll 
want to make sure your latter than that.

On 24/10/2014 19:42, Zaphod Beeblebrox wrote:
> I manually replaced a disk... and the array was scrubbed recently.
> Interestingly, I seem to be in the "endless loop"  of resilvering problem.
> Not much I can find on it.  but resilvering will complete and I can then
> run another scrub.  It will complete, too.  Then rebooting causes another
> resilvering.
>
> Another odd data point: it seems as if the things that show up as "errors"
> change from resilvering to resilvering.
>
> One bug, it would seem, is that once ZFS has detected an error... another
> scrub can reset it, but no attempt is made to read-through the error if you
> access the object directly.
>
> On Fri, Oct 24, 2014 at 11:33 AM, Alan Somers <asomers@freebsd.org> wrote:
>
>> On Thu, Oct 23, 2014 at 11:37 PM, Zaphod Beeblebrox <zbeeble@gmail.com>
>> wrote:
>>> What does it mean when checksum errors appear on the array (and the vdev)
>>> but not on any of the disks?  See the paste below.  One would think that
>>> there isn't some ephemeral data stored somewhere that is not one of the
>>> disks, yet "cksum" errors show only on the vdev and the array lines.
>> Help?
>>> [2:17:316]root@virtual:/vr2/torrent/in> zpool status
>>>    pool: vr2
>>>   state: ONLINE
>>> status: One or more devices is currently being resilvered.  The pool will
>>>          continue to function, possibly in a degraded state.
>>> action: Wait for the resilver to complete.
>>>    scan: resilver in progress since Thu Oct 23 23:11:29 2014
>>>          1.53T scanned out of 22.6T at 62.4M/s, 98h23m to go
>>>          119G resilvered, 6.79% done
>>> config:
>>>
>>>          NAME               STATE     READ WRITE CKSUM
>>>          vr2                ONLINE       0     0    36
>>>            raidz1-0         ONLINE       0     0    72
>>>              label/vr2-d0   ONLINE       0     0     0
>>>              label/vr2-d1   ONLINE       0     0     0
>>>              gpt/vr2-d2c    ONLINE       0     0     0  block size: 512B
>>> configured, 4096B native  (resilvering)
>>>              gpt/vr2-d3b    ONLINE       0     0     0  block size: 512B
>>> configured, 4096B native
>>>              gpt/vr2-d4a    ONLINE       0     0     0  block size: 512B
>>> configured, 4096B native
>>>              ada14          ONLINE       0     0     0
>>>              label/vr2-d6   ONLINE       0     0     0
>>>              label/vr2-d7c  ONLINE       0     0     0
>>>              label/vr2-d8   ONLINE       0     0     0
>>>            raidz1-1         ONLINE       0     0     0
>>>              gpt/vr2-e0     ONLINE       0     0     0  block size: 512B
>>> configured, 4096B native
>>>              gpt/vr2-e1     ONLINE       0     0     0  block size: 512B
>>> configured, 4096B native
>>>              gpt/vr2-e2     ONLINE       0     0     0  block size: 512B
>>> configured, 4096B native
>>>              gpt/vr2-e3     ONLINE       0     0     0
>>>              gpt/vr2-e4     ONLINE       0     0     0  block size: 512B
>>> configured, 4096B native
>>>              gpt/vr2-e5     ONLINE       0     0     0  block size: 512B
>>> configured, 4096B native
>>>              gpt/vr2-e6     ONLINE       0     0     0  block size: 512B
>>> configured, 4096B native
>>>              gpt/vr2-e7     ONLINE       0     0     0  block size: 512B
>>> configured, 4096B native
>>>
>>> errors: 43 data errors, use '-v' for a list
>> The checksum errors will appear on the raidz vdev instead of a leaf if
>> vdev_raidz.c can't determine which leaf vdev was responsible.  This
>> could happen if two or more leaf vdevs return bad data for the same
>> block, which would also lead to unrecoverable data errors.  I see that
>> you have some unrecoverable data errors, so maybe that's what happened
>> to you.
>>
>> Subtle design bugs in ZFS can also lead to vdev_raidz.c being unable
>> to determine which child was responsible for a checksum error.
>> However, I've only seen that happen when a raidz vdev has a mirror
>> child.  That can only happen if the child is a spare or replacing
>> vdev.  Did you activate any spares, or did you manually replace a
>> vdev?
>>
>> -Alan
>>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>
>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?544B12B8.8060302>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation