From owner-freebsd-fs@freebsd.org Thu Sep 8 07:45:40 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 36FA1BD12F3 for ; Thu, 8 Sep 2016 07:45:40 +0000 (UTC) (envelope-from maurizio.vairani@cloverinformatica.it) Received: from host202-129-static.10-188-b.business.telecomitalia.it (host202-129-static.10-188-b.business.telecomitalia.it [188.10.129.202]) by mx1.freebsd.org (Postfix) with ESMTP id 9F2A8AD5; Thu, 8 Sep 2016 07:45:38 +0000 (UTC) (envelope-from maurizio.vairani@cloverinformatica.it) Received: from [192.168.0.60] (unknown [192.168.0.60]) by host202-129-static.10-188-b.business.telecomitalia.it (Postfix) with ESMTP id 5384712AD8F; Thu, 8 Sep 2016 09:39:59 +0200 (CEST) Subject: Re: ZFS-8000-8A: assistance needed To: Ruslan Makhmatkhanov References: From: Maurizio Vairani Cc: freebsd-fs@freebsd.org Message-ID: Date: Thu, 8 Sep 2016 09:39:59 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Sep 2016 07:45:40 -0000 Hi Ruslan, Il 06/09/2016 22:00, Ruslan Makhmatkhanov ha scritto: > Hello, > > I've got something new here and just not sure where to start on > solving that. It's on 10.2-RELEASE-p7 amd64. > > """ > root:~ # zpool status -xv > pool: storage_ssd > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://illumos.org/msg/ZFS-8000-8A > scan: scrub repaired 0 in 0h26m with 5 errors on Tue Aug 23 00:40:24 > 2016 > config: > > NAME STATE READ WRITE CKSUM > storage_ssd ONLINE 0 0 59.3K > mirror-0 ONLINE 0 0 0 > gpt/drive-06 ONLINE 0 0 0 > gpt/drive-07 ONLINE 0 0 9 > mirror-1 ONLINE 0 0 119K > gpt/drive-08 ONLINE 0 0 119K > gpt/drive-09 ONLINE 0 0 119K > cache > mfid5 ONLINE 0 0 0 > mfid6 ONLINE 0 0 0 > > errors: Permanent errors have been detected in the following files: > > <0x1bd0a>:<0x8> > <0x31f23>:<0x8> > /storage_ssd/f262f6ebaf5011e39ca7047d7bb28f4a/disk > /storage_ssd/7ba3f661fa9811e3bd9d047d7bb28f4a/disk > /storage_ssd/2751d305ecba11e3aef0047d7bb28f4a/disk > /storage_ssd/6aa805bd22e911e4b470047d7bb28f4a/disk > """ > > The pool looks ok, if I understand correctly, but we have a slowdown > in Xen VM's, that are using these disks via iSCSI. So can please > anybody explain what exactly that mean? The OS retries the read and/or write operation and you notice a slowdown. > > 1. Am I right that we have a hardware failure that lead to data > corruption? Yes. > If so, how to identify failed disk(s) The disks containing gpt/drive-07, the disk with gpt/drive-08 and the disk with gpt/drive-09. With smartctl you can read the smart status of the disks for more info. I use smartd with HDDs and SSDs and it, usually, warns me about a failing disk before zfs. > and how it is possible that data is corrupted on zfs mirror? If in both disks the sectors with the same data are damaged. > Is there anything I can do to recover except restoring from backup? Probably no, but you can check the iSCSI disk in the Xen VM if it is usable. > > 2. What first and second damaged "files" are and why they are shown > like that? ZFS metadata. > > I have this in /var/log/messages, but to me it looks like iSCSI > message, that's spring up when accessing damaged files: > > """ > kernel: (1:32:0/28): WRITE command returned errno 122 > """ Probably in /var/log/messages you can read messages like this: Aug 27 03:02:19 clover-nas2 kernel: (ada3:ahcich15:0:0:0): CAM status: ATA Status Error Aug 27 03:02:19 clover-nas2 kernel: (ada3:ahcich15:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC ) Aug 27 03:02:19 clover-nas2 kernel: (ada3:ahcich15:0:0:0): RES: 51 40 e8 0f a6 40 44 00 00 08 00 Aug 27 03:02:19 clover-nas2 kernel: (ada3:ahcich15:0:0:0): Error 5, Retries exhausted In this message the /dev/ada3 HDD is failing. > Manual zpool scrub was tried on this pool to not avail. The pool > capacity is only 66% full. > > Thanks for any hints in advance. > Maurizio