From owner-freebsd-fs@freebsd.org  Sat Oct  7 13:57:37 2017
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 18971E38BB5
 for <freebsd-fs@mailman.ysv.freebsd.org>; Sat,  7 Oct 2017 13:57:37 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: from mail-wm0-x22e.google.com (mail-wm0-x22e.google.com
 [IPv6:2a00:1450:400c:c09::22e])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 78A72825EE;
 Sat,  7 Oct 2017 13:57:36 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: by mail-wm0-x22e.google.com with SMTP id b189so12705065wmd.4;
 Sat, 07 Oct 2017 06:57:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:subject:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to;
 bh=8yihce3jASILHlvWfn4dKqrEgVw8cx0NVaP9XsVMWsU=;
 b=q6mIJZOPmSs+wIrHKEpMDfkLHpzAJMdh1g3YiQvrARKzYmMjkmtq07Gx/3vGHVYmyV
 wqjByH5ySslBzByQX6o/O3Bn4n+hjtanpo95r+Y7lTtpJbRF0aAWNJlt16evPMUqLbqe
 FF8ae6T2C8G/ljFN3DcOjLMowGqPyt5uXN7fqrS8sBAohq2pMJysEUa4yihJkEtw8EeX
 CflffOfbdyqJaDKURQV5FRg0kiuuwmVELCgYjSHvCivDIK2/CdmgaQrHT/3Vf8I+54Yg
 WC//rCx37mRmwBnbo/GdpSaYcH26YT+d1FH+RUtpgbu98v9ElN/4JfpZ6KkLDspthVYU
 qOlA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to;
 bh=8yihce3jASILHlvWfn4dKqrEgVw8cx0NVaP9XsVMWsU=;
 b=r8+2vhWn+lqGJZYQGnEoLIOEaHrHHs37wybpPv/s8oENiWor81GM27EMjM4URRUOYI
 bUmfLC1JhS6AP4PajEBYhCRW1RcLJirbHiftxMg3CESq1l/bj/gaXOoNiGizkUiQw1Lw
 fUvtOU5pPw7G9pbp/s47/gZecEazZdFN+ve5dpPuXg6KQtDHXqTXvmCcK94zj2b+3jhx
 J4wWNXTcjUMmXU+VdOv/GZ7mOcizsk1UfvK/UjGx1yomqYTzM1ZOeWrYZlp/P0seMVy8
 yJz5z/qpYqJfNO9G0kD2r5uk+tSvUtjw7nWY05tJfruHiJxeUS3Ca5FPrm7lejeDvb0V
 ZtuQ==
X-Gm-Message-State: AMCzsaVWLshmmrdvkJ2thYQYZNdCb8UiYukIjwquoyAU+T9tRHQiA9I8
 n021HZETHuiCuB9hNK/X1KEyn09dn7E=
X-Google-Smtp-Source: AOwi7QDD0IynrCRXTY3n6m6/23Vz21Wto31S9j14LQ1K21qpQmetjHGQiImITYvGuJ9ElAgmxl75cg==
X-Received: by 10.28.48.143 with SMTP id w137mr3618534wmw.3.1507384653048;
 Sat, 07 Oct 2017 06:57:33 -0700 (PDT)
Received: from bens-mac.home (LFbn-MAR-1-445-220.w2-15.abo.wanadoo.fr.
 [2.15.38.220])
 by smtp.gmail.com with ESMTPSA id p95sm8673178wrc.53.2017.10.07.06.57.31
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Sat, 07 Oct 2017 06:57:32 -0700 (PDT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: ZFS stalled after some mirror disks were lost
From: Ben RUBSON <ben.rubson@gmail.com>
In-Reply-To: <20171007150848.7d50cad4@fabiankeil.de>
Date: Sat, 7 Oct 2017 15:57:30 +0200
Cc: =?utf-8?Q?Edward_Tomasz_Napiera=C5=82a?= <trasz@FreeBSD.org>,
 Fabian Keil <freebsd-listen@fabiankeil.de>, mav@freebsd.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <DFD0528D-549E-44C9-A093-D4A8837CB499@gmail.com>
References: <4A0E9EB8-57EA-4E76-9D7E-3E344B2037D2@gmail.com>
 <DDCFAC80-2D72-4364-85B2-7F4D7D70BCEE@gmail.com>
 <82632887-E9D4-42D0-AC05-3764ABAC6B86@gmail.com>
 <20171007150848.7d50cad4@fabiankeil.de>
To: Freebsd fs <freebsd-fs@freebsd.org>
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 07 Oct 2017 13:57:37 -0000

> On 07 Oct 2017, at 15:08, Fabian Keil <freebsd-listen@fabiankeil.de> =
wrote:
>=20
> Ben RUBSON <ben.rubson@gmail.com> wrote:
>=20
>> So first, many thanks again to Andriy, we spent almost 3 hours =
debugging
>> the stalled server to find the root cause of the issue.
>>=20
>> Sounds like I would need help from iSCSI dev team (Edward perhaps ?), =
as
>> issue seems to be on this side.
>=20
> Maybe.
>=20
>> Here is Andriy conclusion after the debug session, I quote him :
>>=20
>>> So, it seems that the root cause of all evil is this outstanding zio
>>> (it might be not the only one).
>>> In other words, it looks like iscsi stack bailed out without
>>> completing all outstanding i/o requests that it had.
>>> It should either return success or error for every request, it can =
not
>>> simply drop a request.
>>> And that appears to be what happened here. =20
>>=20
>>> It looks like ZFS is fragile in the face of this type of errors.
>=20
> Indeed. In the face of other types of errors as well, though.
>=20
>>> Essentially, each logical i/o request obtains a configuration lock =
of
>>> type 'zio' in shared mode to prevent certain configuration changes
>>> from happening while there are any outsanding zio-s.
>>> If a zio is lost, then this lock is leaked.
>>> Then, the code that deals with vdev failures tries to take this lock =
in
>>> exclusive mode while holding a few other configuration locks also in
>>> exclsuive mode so, any other thread needing those locks would block.
>>> And there are code paths where a configuration lock is taken while
>>> spa_namespace_lock is held.
>>> And when spa_namespace_lock is never dropped then the system is =
close
>>> to toast, because all pool lookups would get stuck.
>>> I don't see how this can be fixed in ZFS. =20
>=20
> While I haven't used iSCSI for a while now, over the years I've seen
> lots of similar issues with ZFS pools located on external USB disks
> and ggate devices (backed by systems with patches for the known data
> corruption issues).
>=20
> At least in my opinion, many of the various known spa_namespace_lock
> issues are plain ZFS issues and could be fixed in ZFS if someone was
> motivated enough to spent the time to actually do it (and then jump
> through the various "upstreaming" hoops).
>=20
> In many cases tolerable workarounds exist, though, and sometimes they
> work around some of the issues well enough. Here's an example =
workaround
> that I've been using for a while now:
> =
https://www.fabiankeil.de/sourcecode/electrobsd/ElectroBSD-r312620-6cfa243=
f1516/0222-ZFS-Optionally-let-spa_sync-wait-until-at-least-one-v.diff
>=20
> According to the commit message the issue was previously mentioned on
> freebsd-current@ in 2014 but I no longer remember all the details and
> didn't look them up.

There's no mention to code revision in this thread.
It finishes with a message from Alexander Motin :
"(...) I've got to conclusion that ZFS in many places
written in a way that simply does not expect errors. In such cases it
just stucks, waiting for disk to reappear and I/O to complete. (...)"

> I'm not claiming that the patch or other workarounds I'm aware of
> would actually help with your ZFS stalls at all, but it's not obvious
> to me that your problems can actually be blamed on the iSCSI code
> either.
>=20
> Did you try to reproduce the problem without iSCSI?

No, I would have to pull out disks from their slots (well...), or =
shut-down
the SAS2008-IT adapter, or put disks offline (not sure how-to for these =
two).

I will test in the next few hours without GPT labels and GEOM labels,
as I use them and Andriy suspects they could be the culprit.

> Anyway, good luck with your ZFS-on-iscsi issue(s).

Thank you very much Fabian for your help and contribution,
I really hope we'll find the root cause of this issue,
as it's quite annoying in a HA-expected production environment :/

Ben