Date: Sat, 7 Oct 2017 15:57:30 +0200 From: Ben RUBSON <ben.rubson@gmail.com> To: Freebsd fs <freebsd-fs@freebsd.org> Cc: =?utf-8?Q?Edward_Tomasz_Napiera=C5=82a?= <trasz@FreeBSD.org>, Fabian Keil <freebsd-listen@fabiankeil.de>, mav@freebsd.org Subject: Re: ZFS stalled after some mirror disks were lost Message-ID: <DFD0528D-549E-44C9-A093-D4A8837CB499@gmail.com> In-Reply-To: <20171007150848.7d50cad4@fabiankeil.de> References: <4A0E9EB8-57EA-4E76-9D7E-3E344B2037D2@gmail.com> <DDCFAC80-2D72-4364-85B2-7F4D7D70BCEE@gmail.com> <82632887-E9D4-42D0-AC05-3764ABAC6B86@gmail.com> <20171007150848.7d50cad4@fabiankeil.de>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 07 Oct 2017, at 15:08, Fabian Keil <freebsd-listen@fabiankeil.de> = wrote: >=20 > Ben RUBSON <ben.rubson@gmail.com> wrote: >=20 >> So first, many thanks again to Andriy, we spent almost 3 hours = debugging >> the stalled server to find the root cause of the issue. >>=20 >> Sounds like I would need help from iSCSI dev team (Edward perhaps ?), = as >> issue seems to be on this side. >=20 > Maybe. >=20 >> Here is Andriy conclusion after the debug session, I quote him : >>=20 >>> So, it seems that the root cause of all evil is this outstanding zio >>> (it might be not the only one). >>> In other words, it looks like iscsi stack bailed out without >>> completing all outstanding i/o requests that it had. >>> It should either return success or error for every request, it can = not >>> simply drop a request. >>> And that appears to be what happened here. =20 >>=20 >>> It looks like ZFS is fragile in the face of this type of errors. >=20 > Indeed. In the face of other types of errors as well, though. >=20 >>> Essentially, each logical i/o request obtains a configuration lock = of >>> type 'zio' in shared mode to prevent certain configuration changes >>> from happening while there are any outsanding zio-s. >>> If a zio is lost, then this lock is leaked. >>> Then, the code that deals with vdev failures tries to take this lock = in >>> exclusive mode while holding a few other configuration locks also in >>> exclsuive mode so, any other thread needing those locks would block. >>> And there are code paths where a configuration lock is taken while >>> spa_namespace_lock is held. >>> And when spa_namespace_lock is never dropped then the system is = close >>> to toast, because all pool lookups would get stuck. >>> I don't see how this can be fixed in ZFS. =20 >=20 > While I haven't used iSCSI for a while now, over the years I've seen > lots of similar issues with ZFS pools located on external USB disks > and ggate devices (backed by systems with patches for the known data > corruption issues). >=20 > At least in my opinion, many of the various known spa_namespace_lock > issues are plain ZFS issues and could be fixed in ZFS if someone was > motivated enough to spent the time to actually do it (and then jump > through the various "upstreaming" hoops). >=20 > In many cases tolerable workarounds exist, though, and sometimes they > work around some of the issues well enough. Here's an example = workaround > that I've been using for a while now: > = https://www.fabiankeil.de/sourcecode/electrobsd/ElectroBSD-r312620-6cfa243= f1516/0222-ZFS-Optionally-let-spa_sync-wait-until-at-least-one-v.diff >=20 > According to the commit message the issue was previously mentioned on > freebsd-current@ in 2014 but I no longer remember all the details and > didn't look them up. There's no mention to code revision in this thread. It finishes with a message from Alexander Motin : "(...) I've got to conclusion that ZFS in many places written in a way that simply does not expect errors. In such cases it just stucks, waiting for disk to reappear and I/O to complete. (...)" > I'm not claiming that the patch or other workarounds I'm aware of > would actually help with your ZFS stalls at all, but it's not obvious > to me that your problems can actually be blamed on the iSCSI code > either. >=20 > Did you try to reproduce the problem without iSCSI? No, I would have to pull out disks from their slots (well...), or = shut-down the SAS2008-IT adapter, or put disks offline (not sure how-to for these = two). I will test in the next few hours without GPT labels and GEOM labels, as I use them and Andriy suspects they could be the culprit. > Anyway, good luck with your ZFS-on-iscsi issue(s). Thank you very much Fabian for your help and contribution, I really hope we'll find the root cause of this issue, as it's quite annoying in a HA-expected production environment :/ Ben
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?DFD0528D-549E-44C9-A093-D4A8837CB499>