From owner-freebsd-stable@freebsd.org Sun Aug 6 01:13:23 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 994A5DD5905 for ; Sun, 6 Aug 2017 01:13:23 +0000 (UTC) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: from uucp.dinoex.sub.de (uucp.dinoex.sub.de [IPv6:2001:1440:5001:1::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "uucp.dinoex.sub.de", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1FAF86A6BF for ; Sun, 6 Aug 2017 01:13:22 +0000 (UTC) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: from uucp.dinoex.sub.de (uucp.dinoex.sub.de [194.45.71.2]) by uucp.dinoex.sub.de (8.15.2/8.14.9) with ESMTPS id v761D462004601 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Sun, 6 Aug 2017 03:13:04 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) X-MDaemon-Deliver-To: Received: from citylink.dinoex.sub.org (uucp@localhost) by uucp.dinoex.sub.de (8.15.2/8.14.9/Submit) with UUCP id v761D4Zf004600 for freebsd-stable@FreeBSD.ORG; Sun, 6 Aug 2017 03:13:04 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: from gate.oper.dinoex.org (gate-e [192.168.98.2]) by citylink.dinoex.sub.de (8.15.2/8.15.2) with ESMTP id v760PGsV024374 for ; Sun, 6 Aug 2017 02:25:16 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: from gate.oper.dinoex.org (gate-e [192.168.98.2]) by gate.oper.dinoex.org (8.15.2/8.15.2) with ESMTP id v760O0qH024199 for ; Sun, 6 Aug 2017 02:24:01 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) Received: (from news@localhost) by gate.oper.dinoex.org (8.15.2/8.15.2/Submit) id v760O0F3024198 for freebsd-stable@FreeBSD.ORG; Sun, 6 Aug 2017 02:24:00 +0200 (CEST) (envelope-from li-fbsd@citylink.dinoex.sub.org) X-Authentication-Warning: gate.oper.dinoex.org: news set sender to li-fbsd@citylink.dinoex.sub.org using -f From: Peter Subject: Re: a strange and terrible saga of the cursed iSCSI ZFS SAN Date: Sun, 6 Aug 2017 02:10:40 +0200 Organization: even some more stinky socks Message-ID: References: <1bd10b1e-0583-6f44-297e-3147f6daddc5@norma.perm.ru> <1d53f489-5135-7633-fef4-35d26e4969dc@norma.perm.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Sun, 6 Aug 2017 00:11:41 -0000 (UTC) Injection-Info: oper.dinoex.de; logging-data="22325"; mail-complaints-to="usenet@citylink.dinoex.sub.org" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:51.0) Gecko/20100101 Firefox/51.0 SeaMonkey/2.48 In-Reply-To: <1d53f489-5135-7633-fef4-35d26e4969dc@norma.perm.ru> Sender: li-fbsd@citylink.dinoex.sub.org To: freebsd-stable@FreeBSD.ORG X-Milter: Spamilter (Reciever: uucp.dinoex.sub.de; Sender-ip: 194.45.71.2; Sender-helo: uucp.dinoex.sub.de; ) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (uucp.dinoex.sub.de [194.45.71.2]); Sun, 06 Aug 2017 03:13:05 +0200 (CEST) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Aug 2017 01:13:23 -0000 Eugene M. Zheganin wrote: > Hi, > > On 05.08.2017 22:08, Eugene M. Zheganin wrote: >> >> pool: userdata >> state: ONLINE >> status: One or more devices has experienced an error resulting in data >> corruption. Applications may be affected. >> action: Restore the file in question if possible. Otherwise restore the >> entire pool from backup. >> see: http://illumos.org/msg/ZFS-8000-8A >> scan: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> userdata ONLINE 0 0 216K >> mirror-0 ONLINE 0 0 432K >> gpt/userdata0 ONLINE 0 0 432K >> gpt/userdata1 ONLINE 0 0 432K > That would be funny, if not that sad, but while writing this message, > the pool started to look like below (I just asked zpool status twice in > a row, comparing to what it was): > > [root@san1:~]# zpool status userdata > pool: userdata > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://illumos.org/msg/ZFS-8000-8A > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > userdata ONLINE 0 0 728K > mirror-0 ONLINE 0 0 1,42M > gpt/userdata0 ONLINE 0 0 1,42M > gpt/userdata1 ONLINE 0 0 1,42M > > errors: 4 data errors, use '-v' for a list > [root@san1:~]# zpool status userdata > pool: userdata > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://illumos.org/msg/ZFS-8000-8A > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > userdata ONLINE 0 0 730K > mirror-0 ONLINE 0 0 1,43M > gpt/userdata0 ONLINE 0 0 1,43M > gpt/userdata1 ONLINE 0 0 1,43M > > errors: 4 data errors, use '-v' for a list > > So, you see, the error rate is like speed of light. And I'm not sure if > the data access rate is that enormous, looks like they are increasing on > their own. > So may be someone have an idea on what this really means. It is remarkable that You always have the same error count on both sides of the mirror. From what I have seen, such a picture appears when an unrecoverable error (i.e. one that is on both sides of the mirror) is read again and again. File number 0x1 is probably some important metadata, and since it is not readable it cannot be put into the ARC, so the read is tried ever again. An error that would appear only on one side appears only once, because then it is auto-corrected. In that case the figures have some erratic deviations. Therefore it is worthwile to remove the erroneous data soon, because as long as that exists one does not get anything useful from the figures (like how many errors are actually appearing anew).