Date: Fri, 24 Oct 2014 09:33:22 -0600 From: Alan Somers <asomers@freebsd.org> To: Zaphod Beeblebrox <zbeeble@gmail.com> Cc: freebsd-fs <freebsd-fs@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: ZFS errors on the array but not the disk. Message-ID: <CAOtMX2g5GYZqYgWNmD_K_TSdTc8oxvvpe4463ni=sEX_b7_Erw@mail.gmail.com> In-Reply-To: <CACpH0MeAvs6rzWUo3uF8uTygPk6qnZE8W=3-zsiTAKdvm4N01w@mail.gmail.com> References: <CACpH0MeAvs6rzWUo3uF8uTygPk6qnZE8W=3-zsiTAKdvm4N01w@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Oct 23, 2014 at 11:37 PM, Zaphod Beeblebrox <zbeeble@gmail.com> wrote: > What does it mean when checksum errors appear on the array (and the vdev) > but not on any of the disks? See the paste below. One would think that > there isn't some ephemeral data stored somewhere that is not one of the > disks, yet "cksum" errors show only on the vdev and the array lines. Help? > > [2:17:316]root@virtual:/vr2/torrent/in> zpool status > pool: vr2 > state: ONLINE > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scan: resilver in progress since Thu Oct 23 23:11:29 2014 > 1.53T scanned out of 22.6T at 62.4M/s, 98h23m to go > 119G resilvered, 6.79% done > config: > > NAME STATE READ WRITE CKSUM > vr2 ONLINE 0 0 36 > raidz1-0 ONLINE 0 0 72 > label/vr2-d0 ONLINE 0 0 0 > label/vr2-d1 ONLINE 0 0 0 > gpt/vr2-d2c ONLINE 0 0 0 block size: 512B > configured, 4096B native (resilvering) > gpt/vr2-d3b ONLINE 0 0 0 block size: 512B > configured, 4096B native > gpt/vr2-d4a ONLINE 0 0 0 block size: 512B > configured, 4096B native > ada14 ONLINE 0 0 0 > label/vr2-d6 ONLINE 0 0 0 > label/vr2-d7c ONLINE 0 0 0 > label/vr2-d8 ONLINE 0 0 0 > raidz1-1 ONLINE 0 0 0 > gpt/vr2-e0 ONLINE 0 0 0 block size: 512B > configured, 4096B native > gpt/vr2-e1 ONLINE 0 0 0 block size: 512B > configured, 4096B native > gpt/vr2-e2 ONLINE 0 0 0 block size: 512B > configured, 4096B native > gpt/vr2-e3 ONLINE 0 0 0 > gpt/vr2-e4 ONLINE 0 0 0 block size: 512B > configured, 4096B native > gpt/vr2-e5 ONLINE 0 0 0 block size: 512B > configured, 4096B native > gpt/vr2-e6 ONLINE 0 0 0 block size: 512B > configured, 4096B native > gpt/vr2-e7 ONLINE 0 0 0 block size: 512B > configured, 4096B native > > errors: 43 data errors, use '-v' for a list The checksum errors will appear on the raidz vdev instead of a leaf if vdev_raidz.c can't determine which leaf vdev was responsible. This could happen if two or more leaf vdevs return bad data for the same block, which would also lead to unrecoverable data errors. I see that you have some unrecoverable data errors, so maybe that's what happened to you. Subtle design bugs in ZFS can also lead to vdev_raidz.c being unable to determine which child was responsible for a checksum error. However, I've only seen that happen when a raidz vdev has a mirror child. That can only happen if the child is a spare or replacing vdev. Did you activate any spares, or did you manually replace a vdev? -Alan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2g5GYZqYgWNmD_K_TSdTc8oxvvpe4463ni=sEX_b7_Erw>