From owner-freebsd-fs@freebsd.org Sat Oct 7 13:13:16 2017 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5427AE37FAA for ; Sat, 7 Oct 2017 13:13:16 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay08.ispgateway.de (smtprelay08.ispgateway.de [134.119.228.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F15D47580E; Sat, 7 Oct 2017 13:13:15 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from [78.35.164.83] (helo=fabiankeil.de) by smtprelay08.ispgateway.de with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89) (envelope-from ) id 1e0otw-0002jl-9a; Sat, 07 Oct 2017 15:12:20 +0200 Date: Sat, 7 Oct 2017 15:08:48 +0200 From: Fabian Keil To: Ben RUBSON Cc: Freebsd fs , Edward Tomasz =?UTF-8?B?TmFwaWVy?= =?UTF-8?B?YcWCYQ==?= Subject: Re: ZFS stalled after some mirror disks were lost Message-ID: <20171007150848.7d50cad4@fabiankeil.de> In-Reply-To: <82632887-E9D4-42D0-AC05-3764ABAC6B86@gmail.com> References: <4A0E9EB8-57EA-4E76-9D7E-3E344B2037D2@gmail.com> <82632887-E9D4-42D0-AC05-3764ABAC6B86@gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/1PQyDAVhgdU79DS=P/mu2nf"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 Oct 2017 13:13:16 -0000 --Sig_/1PQyDAVhgdU79DS=P/mu2nf Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Ben RUBSON wrote: > So first, many thanks again to Andriy, we spent almost 3 hours debugging > the stalled server to find the root cause of the issue. >=20 > Sounds like I would need help from iSCSI dev team (Edward perhaps ?), as > issue seems to be on this side. Maybe. > Here is Andriy conclusion after the debug session, I quote him : >=20 > > So, it seems that the root cause of all evil is this outstanding zio > > (it might be not the only one). > > In other words, it looks like iscsi stack bailed out without > > completing all outstanding i/o requests that it had. > > It should either return success or error for every request, it can not > > simply drop a request. > > And that appears to be what happened here. =20 >=20 > > It looks like ZFS is fragile in the face of this type of errors. Indeed. In the face of other types of errors as well, though. > > Essentially, each logical i/o request obtains a configuration lock of > > type 'zio' in shared mode to prevent certain configuration changes > > from happening while there are any outsanding zio-s. > > If a zio is lost, then this lock is leaked. > > Then, the code that deals with vdev failures tries to take this lock in > > exclusive mode while holding a few other configuration locks also in > > exclsuive mode so, any other thread needing those locks would block. > > And there are code paths where a configuration lock is taken while > > spa_namespace_lock is held. > > And when spa_namespace_lock is never dropped then the system is close > > to toast, because all pool lookups would get stuck. > > I don't see how this can be fixed in ZFS. =20 While I haven't used iSCSI for a while now, over the years I've seen lots of similar issues with ZFS pools located on external USB disks and ggate devices (backed by systems with patches for the known data corruption issues). At least in my opinion, many of the various known spa_namespace_lock issues are plain ZFS issues and could be fixed in ZFS if someone was motivated enough to spent the time to actually do it (and then jump through the various "upstreaming" hoops). In many cases tolerable workarounds exist, though, and sometimes they work around some of the issues well enough. Here's an example workaround that I've been using for a while now: https://www.fabiankeil.de/sourcecode/electrobsd/ElectroBSD-r312620-6cfa243f= 1516/0222-ZFS-Optionally-let-spa_sync-wait-until-at-least-one-v.diff According to the commit message the issue was previously mentioned on freebsd-current@ in 2014 but I no longer remember all the details and didn't look them up. I'm not claiming that the patch or other workarounds I'm aware of would actually help with your ZFS stalls at all, but it's not obvious to me that your problems can actually be blamed on the iSCSI code either. Did you try to reproduce the problem without iSCSI? BTW, here's another (unrelated but somewhat hilarious) example of a known OpenZFS issue next to nobody seems to care about: https://lists.freebsd.org/pipermail/freebsd-fs/2017-August/025110.html I no longer care about this issue either (and thus really can't complain), but I was a bit surprised by the fact that issues like this one survive for so many years in an "enterprise" file system like ZFS. Anyway, good luck with your ZFS-on-iscsi issue(s). Fabian --Sig_/1PQyDAVhgdU79DS=P/mu2nf Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iF0EARECAB0WIQTKUNd6H/m3+ByGULIFiohV/3dUnQUCWdjR4QAKCRAFiohV/3dU nR9oAJ0SFKK9AusN1+7tAZJZ+HMZPPeWUwCeLZNzFvzFh7KS/1pcIV+BJxD3xOA= =hvLO -----END PGP SIGNATURE----- --Sig_/1PQyDAVhgdU79DS=P/mu2nf--