From owner-freebsd-fs@FreeBSD.ORG Sat Oct 25 03:00:15 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 76865ADE for ; Sat, 25 Oct 2014 03:00:15 +0000 (UTC) Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35]) by mx1.freebsd.org (Postfix) with ESMTP id 0FD3D1CE for ; Sat, 25 Oct 2014 03:00:14 +0000 (UTC) Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534) id CF78320E7088D; Sat, 25 Oct 2014 03:00:06 +0000 (UTC) Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk [82.69.141.170]) by smtp1.multiplay.co.uk (Postfix) with ESMTP id BF31A20E7088A for ; Sat, 25 Oct 2014 03:00:06 +0000 (UTC) Message-ID: <544B12B8.8060302@freebsd.org> Date: Sat, 25 Oct 2014 04:02:16 +0100 From: Steven Hartland User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: ZFS errors on the array but not the disk. References: In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Oct 2014 03:00:15 -0000 There was an issue which would cause resilver restarts fixed by *265253* which was MFC'ed to stable/10 by *271683* so you'll want to make sure your latter than that. On 24/10/2014 19:42, Zaphod Beeblebrox wrote: > I manually replaced a disk... and the array was scrubbed recently. > Interestingly, I seem to be in the "endless loop" of resilvering problem. > Not much I can find on it. but resilvering will complete and I can then > run another scrub. It will complete, too. Then rebooting causes another > resilvering. > > Another odd data point: it seems as if the things that show up as "errors" > change from resilvering to resilvering. > > One bug, it would seem, is that once ZFS has detected an error... another > scrub can reset it, but no attempt is made to read-through the error if you > access the object directly. > > On Fri, Oct 24, 2014 at 11:33 AM, Alan Somers wrote: > >> On Thu, Oct 23, 2014 at 11:37 PM, Zaphod Beeblebrox >> wrote: >>> What does it mean when checksum errors appear on the array (and the vdev) >>> but not on any of the disks? See the paste below. One would think that >>> there isn't some ephemeral data stored somewhere that is not one of the >>> disks, yet "cksum" errors show only on the vdev and the array lines. >> Help? >>> [2:17:316]root@virtual:/vr2/torrent/in> zpool status >>> pool: vr2 >>> state: ONLINE >>> status: One or more devices is currently being resilvered. The pool will >>> continue to function, possibly in a degraded state. >>> action: Wait for the resilver to complete. >>> scan: resilver in progress since Thu Oct 23 23:11:29 2014 >>> 1.53T scanned out of 22.6T at 62.4M/s, 98h23m to go >>> 119G resilvered, 6.79% done >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> vr2 ONLINE 0 0 36 >>> raidz1-0 ONLINE 0 0 72 >>> label/vr2-d0 ONLINE 0 0 0 >>> label/vr2-d1 ONLINE 0 0 0 >>> gpt/vr2-d2c ONLINE 0 0 0 block size: 512B >>> configured, 4096B native (resilvering) >>> gpt/vr2-d3b ONLINE 0 0 0 block size: 512B >>> configured, 4096B native >>> gpt/vr2-d4a ONLINE 0 0 0 block size: 512B >>> configured, 4096B native >>> ada14 ONLINE 0 0 0 >>> label/vr2-d6 ONLINE 0 0 0 >>> label/vr2-d7c ONLINE 0 0 0 >>> label/vr2-d8 ONLINE 0 0 0 >>> raidz1-1 ONLINE 0 0 0 >>> gpt/vr2-e0 ONLINE 0 0 0 block size: 512B >>> configured, 4096B native >>> gpt/vr2-e1 ONLINE 0 0 0 block size: 512B >>> configured, 4096B native >>> gpt/vr2-e2 ONLINE 0 0 0 block size: 512B >>> configured, 4096B native >>> gpt/vr2-e3 ONLINE 0 0 0 >>> gpt/vr2-e4 ONLINE 0 0 0 block size: 512B >>> configured, 4096B native >>> gpt/vr2-e5 ONLINE 0 0 0 block size: 512B >>> configured, 4096B native >>> gpt/vr2-e6 ONLINE 0 0 0 block size: 512B >>> configured, 4096B native >>> gpt/vr2-e7 ONLINE 0 0 0 block size: 512B >>> configured, 4096B native >>> >>> errors: 43 data errors, use '-v' for a list >> The checksum errors will appear on the raidz vdev instead of a leaf if >> vdev_raidz.c can't determine which leaf vdev was responsible. This >> could happen if two or more leaf vdevs return bad data for the same >> block, which would also lead to unrecoverable data errors. I see that >> you have some unrecoverable data errors, so maybe that's what happened >> to you. >> >> Subtle design bugs in ZFS can also lead to vdev_raidz.c being unable >> to determine which child was responsible for a checksum error. >> However, I've only seen that happen when a raidz vdev has a mirror >> child. That can only happen if the child is a spare or replacing >> vdev. Did you activate any spares, or did you manually replace a >> vdev? >> >> -Alan >> > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > >