Date: Sat, 25 Oct 2014 04:02:16 +0100 From: Steven Hartland <smh@freebsd.org> To: freebsd-fs@freebsd.org Subject: Re: ZFS errors on the array but not the disk. Message-ID: <544B12B8.8060302@freebsd.org> In-Reply-To: <CACpH0MfL1J8fbP%2BMkdop8C=iTJmvscDv16mVynSqXC0uspdLfw@mail.gmail.com> References: <CACpH0MeAvs6rzWUo3uF8uTygPk6qnZE8W=3-zsiTAKdvm4N01w@mail.gmail.com> <CAOtMX2g5GYZqYgWNmD_K_TSdTc8oxvvpe4463ni=sEX_b7_Erw@mail.gmail.com> <CACpH0MfL1J8fbP%2BMkdop8C=iTJmvscDv16mVynSqXC0uspdLfw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
There was an issue which would cause resilver restarts fixed by *265253* <https://svnweb.freebsd.org/base?view=revision&revision=265253> which was MFC'ed to stable/10 by *271683* <https://svnweb.freebsd.org/base?view=revision&revision=271683>so you'll want to make sure your latter than that. On 24/10/2014 19:42, Zaphod Beeblebrox wrote: > I manually replaced a disk... and the array was scrubbed recently. > Interestingly, I seem to be in the "endless loop" of resilvering problem. > Not much I can find on it. but resilvering will complete and I can then > run another scrub. It will complete, too. Then rebooting causes another > resilvering. > > Another odd data point: it seems as if the things that show up as "errors" > change from resilvering to resilvering. > > One bug, it would seem, is that once ZFS has detected an error... another > scrub can reset it, but no attempt is made to read-through the error if you > access the object directly. > > On Fri, Oct 24, 2014 at 11:33 AM, Alan Somers <asomers@freebsd.org> wrote: > >> On Thu, Oct 23, 2014 at 11:37 PM, Zaphod Beeblebrox <zbeeble@gmail.com> >> wrote: >>> What does it mean when checksum errors appear on the array (and the vdev) >>> but not on any of the disks? See the paste below. One would think that >>> there isn't some ephemeral data stored somewhere that is not one of the >>> disks, yet "cksum" errors show only on the vdev and the array lines. >> Help? >>> [2:17:316]root@virtual:/vr2/torrent/in> zpool status >>> pool: vr2 >>> state: ONLINE >>> status: One or more devices is currently being resilvered. The pool will >>> continue to function, possibly in a degraded state. >>> action: Wait for the resilver to complete. >>> scan: resilver in progress since Thu Oct 23 23:11:29 2014 >>> 1.53T scanned out of 22.6T at 62.4M/s, 98h23m to go >>> 119G resilvered, 6.79% done >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> vr2 ONLINE 0 0 36 >>> raidz1-0 ONLINE 0 0 72 >>> label/vr2-d0 ONLINE 0 0 0 >>> label/vr2-d1 ONLINE 0 0 0 >>> gpt/vr2-d2c ONLINE 0 0 0 block size: 512B >>> configured, 4096B native (resilvering) >>> gpt/vr2-d3b ONLINE 0 0 0 block size: 512B >>> configured, 4096B native >>> gpt/vr2-d4a ONLINE 0 0 0 block size: 512B >>> configured, 4096B native >>> ada14 ONLINE 0 0 0 >>> label/vr2-d6 ONLINE 0 0 0 >>> label/vr2-d7c ONLINE 0 0 0 >>> label/vr2-d8 ONLINE 0 0 0 >>> raidz1-1 ONLINE 0 0 0 >>> gpt/vr2-e0 ONLINE 0 0 0 block size: 512B >>> configured, 4096B native >>> gpt/vr2-e1 ONLINE 0 0 0 block size: 512B >>> configured, 4096B native >>> gpt/vr2-e2 ONLINE 0 0 0 block size: 512B >>> configured, 4096B native >>> gpt/vr2-e3 ONLINE 0 0 0 >>> gpt/vr2-e4 ONLINE 0 0 0 block size: 512B >>> configured, 4096B native >>> gpt/vr2-e5 ONLINE 0 0 0 block size: 512B >>> configured, 4096B native >>> gpt/vr2-e6 ONLINE 0 0 0 block size: 512B >>> configured, 4096B native >>> gpt/vr2-e7 ONLINE 0 0 0 block size: 512B >>> configured, 4096B native >>> >>> errors: 43 data errors, use '-v' for a list >> The checksum errors will appear on the raidz vdev instead of a leaf if >> vdev_raidz.c can't determine which leaf vdev was responsible. This >> could happen if two or more leaf vdevs return bad data for the same >> block, which would also lead to unrecoverable data errors. I see that >> you have some unrecoverable data errors, so maybe that's what happened >> to you. >> >> Subtle design bugs in ZFS can also lead to vdev_raidz.c being unable >> to determine which child was responsible for a checksum error. >> However, I've only seen that happen when a raidz vdev has a mirror >> child. That can only happen if the child is a spare or replacing >> vdev. Did you activate any spares, or did you manually replace a >> vdev? >> >> -Alan >> > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?544B12B8.8060302>