From owner-freebsd-fs@freebsd.org  Sat Oct  7 13:13:16 2017
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5427AE37FAA
 for <freebsd-fs@mailman.ysv.freebsd.org>; Sat,  7 Oct 2017 13:13:16 +0000 (UTC)
 (envelope-from freebsd-listen@fabiankeil.de)
Received: from smtprelay08.ispgateway.de (smtprelay08.ispgateway.de
 [134.119.228.98])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id F15D47580E;
 Sat,  7 Oct 2017 13:13:15 +0000 (UTC)
 (envelope-from freebsd-listen@fabiankeil.de)
Received: from [78.35.164.83] (helo=fabiankeil.de)
 by smtprelay08.ispgateway.de with esmtpsa
 (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89)
 (envelope-from <freebsd-listen@fabiankeil.de>)
 id 1e0otw-0002jl-9a; Sat, 07 Oct 2017 15:12:20 +0200
Date: Sat, 7 Oct 2017 15:08:48 +0200
From: Fabian Keil <freebsd-listen@fabiankeil.de>
To: Ben RUBSON <ben.rubson@gmail.com>
Cc: Freebsd fs <freebsd-fs@freebsd.org>, Edward Tomasz =?UTF-8?B?TmFwaWVy?=
 =?UTF-8?B?YcWCYQ==?= <trasz@FreeBSD.org>
Subject: Re: ZFS stalled after some mirror disks were lost
Message-ID: <20171007150848.7d50cad4@fabiankeil.de>
In-Reply-To: <82632887-E9D4-42D0-AC05-3764ABAC6B86@gmail.com>
References: <4A0E9EB8-57EA-4E76-9D7E-3E344B2037D2@gmail.com>
 <DDCFAC80-2D72-4364-85B2-7F4D7D70BCEE@gmail.com>
 <82632887-E9D4-42D0-AC05-3764ABAC6B86@gmail.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 boundary="Sig_/1PQyDAVhgdU79DS=P/mu2nf"; protocol="application/pgp-signature"
X-Df-Sender: Nzc1MDY3
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 07 Oct 2017 13:13:16 -0000

--Sig_/1PQyDAVhgdU79DS=P/mu2nf
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

Ben RUBSON <ben.rubson@gmail.com> wrote:

> So first, many thanks again to Andriy, we spent almost 3 hours debugging
> the stalled server to find the root cause of the issue.
>=20
> Sounds like I would need help from iSCSI dev team (Edward perhaps ?), as
> issue seems to be on this side.

Maybe.

> Here is Andriy conclusion after the debug session, I quote him :
>=20
> > So, it seems that the root cause of all evil is this outstanding zio
> > (it might be not the only one).
> > In other words, it looks like iscsi stack bailed out without
> > completing all outstanding i/o requests that it had.
> > It should either return success or error for every request, it can not
> > simply drop a request.
> > And that appears to be what happened here. =20
>=20
> > It looks like ZFS is fragile in the face of this type of errors.

Indeed. In the face of other types of errors as well, though.

> > Essentially, each logical i/o request obtains a configuration lock of
> > type 'zio' in shared mode to prevent certain configuration changes
> > from happening while there are any outsanding zio-s.
> > If a zio is lost, then this lock is leaked.
> > Then, the code that deals with vdev failures tries to take this lock in
> > exclusive mode while holding a few other configuration locks also in
> > exclsuive mode so, any other thread needing those locks would block.
> > And there are code paths where a configuration lock is taken while
> > spa_namespace_lock is held.
> > And when spa_namespace_lock is never dropped then the system is close
> > to toast, because all pool lookups would get stuck.
> > I don't see how this can be fixed in ZFS. =20

While I haven't used iSCSI for a while now, over the years I've seen
lots of similar issues with ZFS pools located on external USB disks
and ggate devices (backed by systems with patches for the known data
corruption issues).

At least in my opinion, many of the various known spa_namespace_lock
issues are plain ZFS issues and could be fixed in ZFS if someone was
motivated enough to spent the time to actually do it (and then jump
through the various "upstreaming" hoops).

In many cases tolerable workarounds exist, though, and sometimes they
work around some of the issues well enough. Here's an example workaround
that I've been using for a while now:
https://www.fabiankeil.de/sourcecode/electrobsd/ElectroBSD-r312620-6cfa243f=
1516/0222-ZFS-Optionally-let-spa_sync-wait-until-at-least-one-v.diff

According to the commit message the issue was previously mentioned on
freebsd-current@ in 2014 but I no longer remember all the details and
didn't look them up.

I'm not claiming that the patch or other workarounds I'm aware of
would actually help with your ZFS stalls at all, but it's not obvious
to me that your problems can actually be blamed on the iSCSI code
either.

Did you try to reproduce the problem without iSCSI?

BTW, here's another (unrelated but somewhat hilarious) example
of a known OpenZFS issue next to nobody seems to care about:
https://lists.freebsd.org/pipermail/freebsd-fs/2017-August/025110.html

I no longer care about this issue either (and thus really can't
complain), but I was a bit surprised by the fact that issues like
this one survive for so many years in an "enterprise" file system
like ZFS.

Anyway, good luck with your ZFS-on-iscsi issue(s).

Fabian

--Sig_/1PQyDAVhgdU79DS=P/mu2nf
Content-Type: application/pgp-signature
Content-Description: OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----

iF0EARECAB0WIQTKUNd6H/m3+ByGULIFiohV/3dUnQUCWdjR4QAKCRAFiohV/3dU
nR9oAJ0SFKK9AusN1+7tAZJZ+HMZPPeWUwCeLZNzFvzFh7KS/1pcIV+BJxD3xOA=
=hvLO
-----END PGP SIGNATURE-----

--Sig_/1PQyDAVhgdU79DS=P/mu2nf--