From owner-freebsd-fs@FreeBSD.ORG  Sat Oct 25 03:00:15 2014
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 76865ADE
 for <freebsd-fs@freebsd.org>; Sat, 25 Oct 2014 03:00:15 +0000 (UTC)
Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35])
 by mx1.freebsd.org (Postfix) with ESMTP id 0FD3D1CE
 for <freebsd-fs@freebsd.org>; Sat, 25 Oct 2014 03:00:14 +0000 (UTC)
Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534)
 id CF78320E7088D; Sat, 25 Oct 2014 03:00:06 +0000 (UTC)
Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk
 [82.69.141.170])
 by smtp1.multiplay.co.uk (Postfix) with ESMTP id BF31A20E7088A
 for <freebsd-fs@freebsd.org>; Sat, 25 Oct 2014 03:00:06 +0000 (UTC)
Message-ID: <544B12B8.8060302@freebsd.org>
Date: Sat, 25 Oct 2014 04:02:16 +0100
From: Steven Hartland <smh@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 5.1;
 rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Subject: Re: ZFS errors on the array but not the disk.
References: <CACpH0MeAvs6rzWUo3uF8uTygPk6qnZE8W=3-zsiTAKdvm4N01w@mail.gmail.com>
 <CAOtMX2g5GYZqYgWNmD_K_TSdTc8oxvvpe4463ni=sEX_b7_Erw@mail.gmail.com>
 <CACpH0MfL1J8fbP+Mkdop8C=iTJmvscDv16mVynSqXC0uspdLfw@mail.gmail.com>
In-Reply-To: <CACpH0MfL1J8fbP+Mkdop8C=iTJmvscDv16mVynSqXC0uspdLfw@mail.gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 25 Oct 2014 03:00:15 -0000

There was an issue which would cause resilver restarts fixed by *265253* 
<https://svnweb.freebsd.org/base?view=revision&revision=265253> which 
was MFC'ed to stable/10 by *271683* 
<https://svnweb.freebsd.org/base?view=revision&revision=271683>so you'll 
want to make sure your latter than that.

On 24/10/2014 19:42, Zaphod Beeblebrox wrote:
> I manually replaced a disk... and the array was scrubbed recently.
> Interestingly, I seem to be in the "endless loop"  of resilvering problem.
> Not much I can find on it.  but resilvering will complete and I can then
> run another scrub.  It will complete, too.  Then rebooting causes another
> resilvering.
>
> Another odd data point: it seems as if the things that show up as "errors"
> change from resilvering to resilvering.
>
> One bug, it would seem, is that once ZFS has detected an error... another
> scrub can reset it, but no attempt is made to read-through the error if you
> access the object directly.
>
> On Fri, Oct 24, 2014 at 11:33 AM, Alan Somers <asomers@freebsd.org> wrote:
>
>> On Thu, Oct 23, 2014 at 11:37 PM, Zaphod Beeblebrox <zbeeble@gmail.com>
>> wrote:
>>> What does it mean when checksum errors appear on the array (and the vdev)
>>> but not on any of the disks?  See the paste below.  One would think that
>>> there isn't some ephemeral data stored somewhere that is not one of the
>>> disks, yet "cksum" errors show only on the vdev and the array lines.
>> Help?
>>> [2:17:316]root@virtual:/vr2/torrent/in> zpool status
>>>    pool: vr2
>>>   state: ONLINE
>>> status: One or more devices is currently being resilvered.  The pool will
>>>          continue to function, possibly in a degraded state.
>>> action: Wait for the resilver to complete.
>>>    scan: resilver in progress since Thu Oct 23 23:11:29 2014
>>>          1.53T scanned out of 22.6T at 62.4M/s, 98h23m to go
>>>          119G resilvered, 6.79% done
>>> config:
>>>
>>>          NAME               STATE     READ WRITE CKSUM
>>>          vr2                ONLINE       0     0    36
>>>            raidz1-0         ONLINE       0     0    72
>>>              label/vr2-d0   ONLINE       0     0     0
>>>              label/vr2-d1   ONLINE       0     0     0
>>>              gpt/vr2-d2c    ONLINE       0     0     0  block size: 512B
>>> configured, 4096B native  (resilvering)
>>>              gpt/vr2-d3b    ONLINE       0     0     0  block size: 512B
>>> configured, 4096B native
>>>              gpt/vr2-d4a    ONLINE       0     0     0  block size: 512B
>>> configured, 4096B native
>>>              ada14          ONLINE       0     0     0
>>>              label/vr2-d6   ONLINE       0     0     0
>>>              label/vr2-d7c  ONLINE       0     0     0
>>>              label/vr2-d8   ONLINE       0     0     0
>>>            raidz1-1         ONLINE       0     0     0
>>>              gpt/vr2-e0     ONLINE       0     0     0  block size: 512B
>>> configured, 4096B native
>>>              gpt/vr2-e1     ONLINE       0     0     0  block size: 512B
>>> configured, 4096B native
>>>              gpt/vr2-e2     ONLINE       0     0     0  block size: 512B
>>> configured, 4096B native
>>>              gpt/vr2-e3     ONLINE       0     0     0
>>>              gpt/vr2-e4     ONLINE       0     0     0  block size: 512B
>>> configured, 4096B native
>>>              gpt/vr2-e5     ONLINE       0     0     0  block size: 512B
>>> configured, 4096B native
>>>              gpt/vr2-e6     ONLINE       0     0     0  block size: 512B
>>> configured, 4096B native
>>>              gpt/vr2-e7     ONLINE       0     0     0  block size: 512B
>>> configured, 4096B native
>>>
>>> errors: 43 data errors, use '-v' for a list
>> The checksum errors will appear on the raidz vdev instead of a leaf if
>> vdev_raidz.c can't determine which leaf vdev was responsible.  This
>> could happen if two or more leaf vdevs return bad data for the same
>> block, which would also lead to unrecoverable data errors.  I see that
>> you have some unrecoverable data errors, so maybe that's what happened
>> to you.
>>
>> Subtle design bugs in ZFS can also lead to vdev_raidz.c being unable
>> to determine which child was responsible for a checksum error.
>> However, I've only seen that happen when a raidz vdev has a mirror
>> child.  That can only happen if the child is a spare or replacing
>> vdev.  Did you activate any spares, or did you manually replace a
>> vdev?
>>
>> -Alan
>>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>
>