Date: Sat, 5 Aug 2017 19:51:44 +0200 From: Fabian Keil <freebsd-listen@fabiankeil.de> To: "Eugene M. Zheganin" <emz@norma.perm.ru> Cc: freebsd-stable@FreeBSD.org, freebsd-fs@freebsd.org Subject: Re: a strange and terrible saga of the cursed iSCSI ZFS SAN Message-ID: <20170805195144.1caf98dc@fabiankeil.de> In-Reply-To: <1d53f489-5135-7633-fef4-35d26e4969dc@norma.perm.ru> References: <1bd10b1e-0583-6f44-297e-3147f6daddc5@norma.perm.ru> <1d53f489-5135-7633-fef4-35d26e4969dc@norma.perm.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/iPaWgOy.Hpp7k6QXYIQOmL8 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable "Eugene M. Zheganin" <emz@norma.perm.ru> wrote: > On 05.08.2017 22:08, Eugene M. Zheganin wrote: > > > > pool: userdata > > state: ONLINE > > status: One or more devices has experienced an error resulting in data > > corruption. Applications may be affected. > > action: Restore the file in question if possible. Otherwise restore the > > entire pool from backup. > > see: http://illumos.org/msg/ZFS-8000-8A > > scan: none requested > > config: > > > > NAME STATE READ WRITE CKSUM > > userdata ONLINE 0 0 216K > > mirror-0 ONLINE 0 0 432K > > gpt/userdata0 ONLINE 0 0 432K > > gpt/userdata1 ONLINE 0 0 432K =20 > That would be funny, if not that sad, but while writing this message,=20 > the pool started to look like below (I just asked zpool status twice in=20 > a row, comparing to what it was): >=20 > [root@san1:~]# zpool status userdata > pool: userdata > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://illumos.org/msg/ZFS-8000-8A > scan: none requested > config: >=20 > NAME STATE READ WRITE CKSUM > userdata ONLINE 0 0 728K > mirror-0 ONLINE 0 0 1,42M > gpt/userdata0 ONLINE 0 0 1,42M > gpt/userdata1 ONLINE 0 0 1,42M >=20 > errors: 4 data errors, use '-v' for a list > [root@san1:~]# zpool status userdata > pool: userdata > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://illumos.org/msg/ZFS-8000-8A > scan: none requested > config: >=20 > NAME STATE READ WRITE CKSUM > userdata ONLINE 0 0 730K > mirror-0 ONLINE 0 0 1,43M > gpt/userdata0 ONLINE 0 0 1,43M > gpt/userdata1 ONLINE 0 0 1,43M >=20 > errors: 4 data errors, use '-v' for a list >=20 > So, you see, the error rate is like speed of light. And I'm not sure if=20 > the data access rate is that enormous, looks like they are increasing on= =20 > their own. > So may be someone have an idea on what this really means. Quoting a comment from sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_m= isc.c: /* * If destroy encounters an EIO while reading metadata (e.g. indirect * blocks), space referenced by the missing metadata can not be freed. * Normally this causes the background destroy to become "stalled", as * it is unable to make forward progress. While in this stalled state, * all remaining space to free from the error-encountering filesystem is * "temporarily leaked". Set this flag to cause it to ignore the EIO, * permanently leak the space from indirect blocks that can not be read, * and continue to free everything else that it can. * * The default, "stalling" behavior is useful if the storage partially * fails (i.e. some but not all i/os fail), and then later recovers. In * this case, we will be able to continue pool operations while it is * partially failed, and when it recovers, we can continue to free the * space, with no leaks. However, note that this case is actually * fairly rare. * * Typically pools either (a) fail completely (but perhaps temporarily, * e.g. a top-level vdev going offline), or (b) have localized, * permanent errors (e.g. disk returns the wrong data due to bit flip or * firmware bug). In case (a), this setting does not matter because the * pool will be suspended and the sync thread will not be able to make * forward progress regardless. In case (b), because the error is * permanent, the best we can do is leak the minimum amount of space, * which is what setting this flag will do. Therefore, it is reasonable * for this flag to normally be set, but we chose the more conservative * approach of not setting it, so that there is no possibility of * leaking space in the "partial temporary" failure case. */ In FreeBSD the "flag" currently isn't easily reachable due to the lack of a powerful kernel debugger (like mdb in Solaris offsprings) but it can be made reachable with a sysctl using the patch from: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D218954 Fabian --Sig_/iPaWgOy.Hpp7k6QXYIQOmL8 Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQTKUNd6H/m3+ByGULIFiohV/3dUnQUCWYYFsQAKCRAFiohV/3dU nd3AAJ94LgHj630WLpNwyH3SKQj2l6hF9ACgqm2KgnEqE0xGYO0wswxBFpktykA= =hqGa -----END PGP SIGNATURE----- --Sig_/iPaWgOy.Hpp7k6QXYIQOmL8--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170805195144.1caf98dc>