From owner-freebsd-fs@freebsd.org  Sat Aug  5 17:52:35 2017
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E991BDC1001;
 Sat,  5 Aug 2017 17:52:35 +0000 (UTC)
 (envelope-from freebsd-listen@fabiankeil.de)
Received: from smtprelay07.ispgateway.de (smtprelay07.ispgateway.de
 [134.119.228.103])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id AE52281DA8;
 Sat,  5 Aug 2017 17:52:35 +0000 (UTC)
 (envelope-from freebsd-listen@fabiankeil.de)
Received: from [78.35.133.192] (helo=fabiankeil.de)
 by smtprelay07.ispgateway.de with esmtpsa
 (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89)
 (envelope-from <freebsd-listen@fabiankeil.de>)
 id 1de3FR-0006iJ-Oq; Sat, 05 Aug 2017 19:52:25 +0200
Date: Sat, 5 Aug 2017 19:51:44 +0200
From: Fabian Keil <freebsd-listen@fabiankeil.de>
To: "Eugene M. Zheganin" <emz@norma.perm.ru>
Cc: freebsd-stable@FreeBSD.org, freebsd-fs@freebsd.org
Subject: Re: a strange and terrible saga of the cursed iSCSI ZFS SAN
Message-ID: <20170805195144.1caf98dc@fabiankeil.de>
In-Reply-To: <1d53f489-5135-7633-fef4-35d26e4969dc@norma.perm.ru>
References: <1bd10b1e-0583-6f44-297e-3147f6daddc5@norma.perm.ru>
 <1d53f489-5135-7633-fef4-35d26e4969dc@norma.perm.ru>
Reply-To: freebsd-fs@freebsd.org
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 boundary="Sig_/iPaWgOy.Hpp7k6QXYIQOmL8"; protocol="application/pgp-signature"
X-Df-Sender: Nzc1MDY3
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 05 Aug 2017 17:52:36 -0000

--Sig_/iPaWgOy.Hpp7k6QXYIQOmL8
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

"Eugene M. Zheganin" <emz@norma.perm.ru> wrote:

> On 05.08.2017 22:08, Eugene M. Zheganin wrote:
> >
> >   pool: userdata
> >  state: ONLINE
> > status: One or more devices has experienced an error resulting in data
> >         corruption.  Applications may be affected.
> > action: Restore the file in question if possible.  Otherwise restore the
> >         entire pool from backup.
> >    see: http://illumos.org/msg/ZFS-8000-8A
> >   scan: none requested
> > config:
> >
> >         NAME               STATE     READ WRITE CKSUM
> >         userdata           ONLINE       0     0  216K
> >           mirror-0         ONLINE       0     0  432K
> >             gpt/userdata0  ONLINE       0     0  432K
> >             gpt/userdata1  ONLINE       0     0  432K =20
> That would be funny, if not that sad, but while writing this message,=20
> the pool started to look like below (I just asked zpool status twice in=20
> a row, comparing to what it was):
>=20
> [root@san1:~]# zpool status userdata
>    pool: userdata
>   state: ONLINE
> status: One or more devices has experienced an error resulting in data
>          corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>          entire pool from backup.
>     see: http://illumos.org/msg/ZFS-8000-8A
>    scan: none requested
> config:
>=20
>          NAME               STATE     READ WRITE CKSUM
>          userdata           ONLINE       0     0  728K
>            mirror-0         ONLINE       0     0 1,42M
>              gpt/userdata0  ONLINE       0     0 1,42M
>              gpt/userdata1  ONLINE       0     0 1,42M
>=20
> errors: 4 data errors, use '-v' for a list
> [root@san1:~]# zpool status userdata
>    pool: userdata
>   state: ONLINE
> status: One or more devices has experienced an error resulting in data
>          corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>          entire pool from backup.
>     see: http://illumos.org/msg/ZFS-8000-8A
>    scan: none requested
> config:
>=20
>          NAME               STATE     READ WRITE CKSUM
>          userdata           ONLINE       0     0  730K
>            mirror-0         ONLINE       0     0 1,43M
>              gpt/userdata0  ONLINE       0     0 1,43M
>              gpt/userdata1  ONLINE       0     0 1,43M
>=20
> errors: 4 data errors, use '-v' for a list
>=20
> So, you see, the error rate is like speed of light. And I'm not sure if=20
> the data access rate is that enormous, looks like they are increasing on=
=20
> their own.
> So may be someone have an idea on what this really means.

Quoting a comment from sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_m=
isc.c:
/*
 * If destroy encounters an EIO while reading metadata (e.g. indirect
 * blocks), space referenced by the missing metadata can not be freed.
 * Normally this causes the background destroy to become "stalled", as
 * it is unable to make forward progress.  While in this stalled state,
 * all remaining space to free from the error-encountering filesystem is
 * "temporarily leaked".  Set this flag to cause it to ignore the EIO,
 * permanently leak the space from indirect blocks that can not be read,
 * and continue to free everything else that it can.
 *
 * The default, "stalling" behavior is useful if the storage partially
 * fails (i.e. some but not all i/os fail), and then later recovers.  In
 * this case, we will be able to continue pool operations while it is
 * partially failed, and when it recovers, we can continue to free the
 * space, with no leaks.  However, note that this case is actually
 * fairly rare.
 *
 * Typically pools either (a) fail completely (but perhaps temporarily,
 * e.g. a top-level vdev going offline), or (b) have localized,
 * permanent errors (e.g. disk returns the wrong data due to bit flip or
 * firmware bug).  In case (a), this setting does not matter because the
 * pool will be suspended and the sync thread will not be able to make
 * forward progress regardless.  In case (b), because the error is
 * permanent, the best we can do is leak the minimum amount of space,
 * which is what setting this flag will do.  Therefore, it is reasonable
 * for this flag to normally be set, but we chose the more conservative
 * approach of not setting it, so that there is no possibility of
 * leaking space in the "partial temporary" failure case.
 */

In FreeBSD the "flag" currently isn't easily reachable due to the lack
of a powerful kernel debugger (like mdb in Solaris offsprings) but
it can be made reachable with a sysctl using the patch from:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D218954

Fabian

--Sig_/iPaWgOy.Hpp7k6QXYIQOmL8
Content-Type: application/pgp-signature
Content-Description: OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----

iF0EARECAB0WIQTKUNd6H/m3+ByGULIFiohV/3dUnQUCWYYFsQAKCRAFiohV/3dU
nd3AAJ94LgHj630WLpNwyH3SKQj2l6hF9ACgqm2KgnEqE0xGYO0wswxBFpktykA=
=hqGa
-----END PGP SIGNATURE-----

--Sig_/iPaWgOy.Hpp7k6QXYIQOmL8--