Date: Thu, 8 Sep 2016 09:39:59 +0200 From: Maurizio Vairani <maurizio.vairani@cloverinformatica.it> To: Ruslan Makhmatkhanov <rm@FreeBSD.org> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS-8000-8A: assistance needed Message-ID: <b9edb1ae-b59a-aefc-f547-1fb69e79f0f7@cloverinformatica.it> In-Reply-To: <c6e3d35a-d554-a809-4959-ee858c38aca7@FreeBSD.org> References: <c6e3d35a-d554-a809-4959-ee858c38aca7@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Ruslan, Il 06/09/2016 22:00, Ruslan Makhmatkhanov ha scritto: > Hello, > > I've got something new here and just not sure where to start on > solving that. It's on 10.2-RELEASE-p7 amd64. > > """ > root:~ # zpool status -xv > pool: storage_ssd > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://illumos.org/msg/ZFS-8000-8A > scan: scrub repaired 0 in 0h26m with 5 errors on Tue Aug 23 00:40:24 > 2016 > config: > > NAME STATE READ WRITE CKSUM > storage_ssd ONLINE 0 0 59.3K > mirror-0 ONLINE 0 0 0 > gpt/drive-06 ONLINE 0 0 0 > gpt/drive-07 ONLINE 0 0 9 > mirror-1 ONLINE 0 0 119K > gpt/drive-08 ONLINE 0 0 119K > gpt/drive-09 ONLINE 0 0 119K > cache > mfid5 ONLINE 0 0 0 > mfid6 ONLINE 0 0 0 > > errors: Permanent errors have been detected in the following files: > > <0x1bd0a>:<0x8> > <0x31f23>:<0x8> > /storage_ssd/f262f6ebaf5011e39ca7047d7bb28f4a/disk > /storage_ssd/7ba3f661fa9811e3bd9d047d7bb28f4a/disk > /storage_ssd/2751d305ecba11e3aef0047d7bb28f4a/disk > /storage_ssd/6aa805bd22e911e4b470047d7bb28f4a/disk > """ > > The pool looks ok, if I understand correctly, but we have a slowdown > in Xen VM's, that are using these disks via iSCSI. So can please > anybody explain what exactly that mean? The OS retries the read and/or write operation and you notice a slowdown. > > 1. Am I right that we have a hardware failure that lead to data > corruption? Yes. > If so, how to identify failed disk(s) The disks containing gpt/drive-07, the disk with gpt/drive-08 and the disk with gpt/drive-09. With smartctl you can read the smart status of the disks for more info. I use smartd with HDDs and SSDs and it, usually, warns me about a failing disk before zfs. > and how it is possible that data is corrupted on zfs mirror? If in both disks the sectors with the same data are damaged. > Is there anything I can do to recover except restoring from backup? Probably no, but you can check the iSCSI disk in the Xen VM if it is usable. > > 2. What first and second damaged "files" are and why they are shown > like that? ZFS metadata. > > I have this in /var/log/messages, but to me it looks like iSCSI > message, that's spring up when accessing damaged files: > > """ > kernel: (1:32:0/28): WRITE command returned errno 122 > """ Probably in /var/log/messages you can read messages like this: Aug 27 03:02:19 clover-nas2 kernel: (ada3:ahcich15:0:0:0): CAM status: ATA Status Error Aug 27 03:02:19 clover-nas2 kernel: (ada3:ahcich15:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC ) Aug 27 03:02:19 clover-nas2 kernel: (ada3:ahcich15:0:0:0): RES: 51 40 e8 0f a6 40 44 00 00 08 00 Aug 27 03:02:19 clover-nas2 kernel: (ada3:ahcich15:0:0:0): Error 5, Retries exhausted In this message the /dev/ada3 HDD is failing. > Manual zpool scrub was tried on this pool to not avail. The pool > capacity is only 66% full. > > Thanks for any hints in advance. > Maurizio
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b9edb1ae-b59a-aefc-f547-1fb69e79f0f7>