From owner-freebsd-fs@freebsd.org  Fri Oct  6 10:09:00 2017
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 05F44E32C69;
 Fri,  6 Oct 2017 10:09:00 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: from mail-wm0-x244.google.com (mail-wm0-x244.google.com
 [IPv6:2a00:1450:400c:c09::244])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 9652C7C264;
 Fri,  6 Oct 2017 10:08:59 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: by mail-wm0-x244.google.com with SMTP id q132so6924175wmd.2;
 Fri, 06 Oct 2017 03:08:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:subject:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to;
 bh=UpHVYrJAyRFA/okHlePHWZbTVK8Y5iq4PgSPAvQf7wM=;
 b=UUODtDqWhtKIt7hgjHIh2lgw6PDjBGRRdbp9v6rQTAyXIIyTfT835M7t+/hnIEgtaN
 +6V0Ar/APAR/lFMDJyg9obpXsd1YnG/6ziy3DDqbpWvrfzB6/Urpt4eWX8A76zuagaKG
 Cuxd4YbUMOLtQLjCX2fJ4akMUL3nJgVxFMVTcbFlh0bdK71UFhu4N2f/rWfNfhccw2kw
 kKN9RvNeZjckqztmAIKn3e5OBC94EL3/qzRuGT2u70STsuXw3qAtYM3fauo3phN9dFSy
 op5aF7UIXxL7ZrgkqpFW6sntmmvzjZj9VLIPnZDmjtdoBv8wF54ySoRzGTspjhkMQZ2e
 XjIQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to;
 bh=UpHVYrJAyRFA/okHlePHWZbTVK8Y5iq4PgSPAvQf7wM=;
 b=R9LfkjmUmIEO/WjPJMGIe4Wk7B+iej5fKJIVnGp83tEZpdrANBWbf8tWOOcA4zPY9B
 hkUoAzLJta+NaVLRws7Iv9AHjIDrz3vic2tkyNHTqqJnJesR6J7aIYxMlo7dGjrftbO8
 AYxumqSmXgf9lghskqPIxDeihmAgBL8U5qkI/xrmsuFx5rpPuDb7nLEdVQmBqGMjU0Ck
 LOtSKpdrOFf7mrZfS9a46eU9SWrKq0QBmt7QyMbwWQsDpry27l0GJA98k6vTIa26yklu
 oVT0xAAyLfEJszNCsQFNnCawsSYa8AiKK5j/y2G893o+WB1TAFdYZBC3db+MBkEMIrCZ
 5OiQ==
X-Gm-Message-State: AMCzsaVImHJhkq59FyGwsq/6rlndaqM9Sw6srdAdvyvXOBnVphNOwwAv
 K2YeOHctD4CxfO7obCBZnLhNolMq
X-Google-Smtp-Source: AOwi7QDrbUNE/rsqNrs4O7gWR+HaVkzwREPc1nFG0/NchrPl3oW79X6VUfI4ae6tT+deecQzd4TLvA==
X-Received: by 10.28.153.85 with SMTP id b82mr1125513wme.121.1507284537762;
 Fri, 06 Oct 2017 03:08:57 -0700 (PDT)
Received: from bens-mac.home (LFbn-MAR-1-445-220.w2-15.abo.wanadoo.fr.
 [2.15.38.220])
 by smtp.gmail.com with ESMTPSA id d17sm985661wrc.13.2017.10.06.03.08.56
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Fri, 06 Oct 2017 03:08:57 -0700 (PDT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: ZFS stalled after some mirror disks were lost
From: Ben RUBSON <ben.rubson@gmail.com>
In-Reply-To: <DDCFAC80-2D72-4364-85B2-7F4D7D70BCEE@gmail.com>
Date: Fri, 6 Oct 2017 12:08:55 +0200
Cc: Freebsd fs <freebsd-fs@freebsd.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <82632887-E9D4-42D0-AC05-3764ABAC6B86@gmail.com>
References: <4A0E9EB8-57EA-4E76-9D7E-3E344B2037D2@gmail.com>
 <DDCFAC80-2D72-4364-85B2-7F4D7D70BCEE@gmail.com>
To: FreeBSD-scsi <freebsd-scsi@freebsd.org>,
 =?utf-8?Q?Edward_Tomasz_Napiera=C5=82a?= <trasz@FreeBSD.org>
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Oct 2017 10:09:00 -0000

> On 02 Oct 2017, at 20:12, Ben RUBSON <ben.rubson@gmail.com> wrote:
>=20
> Hi,
>=20
> On a FreeBSD 11 server, the following online/healthy zpool :
>=20
> home
> mirror-0
>   label/local1
>   label/local2
>   label/iscsi1
>   label/iscsi2
> mirror-1
>   label/local3
>   label/local4
>   label/iscsi3
>   label/iscsi4
> cache
> label/local5
> label/local6
>=20
> A sustained read throughput of 180 MB/s, 45 MB/s on each iscsi disk
> according to "zpool iostat", nothing on local disks.
> No write IOs.
>=20
> Let's disconnect all iSCSI disks :
> iscsictl -Ra
>=20
> Expected behavior :
> IO activity flawlessly continue on local disks.
>=20
> What happened :
> All IOs stalled, server only answers to IOs made to its zroot pool.
> All commands related to the iSCSI disks (iscsictl), or to ZFS =
(zfs/zpool),
> don't return.
>=20
> Questions :
> Why this behavior ?
> How to know what happens ? (/var/log/messages says almost nothing)
>=20
> I already disconnected the iSCSI disks without any issue in the past,
> several times, but there were almost no IOs running.
>=20
> Thank you for your help !
>=20
> Ben

Hello,

So first, many thanks again to Andriy, we spent almost 3 hours debugging =
the
stalled server to find the root cause of the issue.

Sounds like I would need help from iSCSI dev team (Edward perhaps ?), as =
issue
seems to be on this side.

Here is Andriy conclusion after the debug session, I quote him :

> So, it seems that the root cause of all evil is this outstanding zio =
(it might
> be not the only one).
> In other words, it looks like iscsi stack bailed out without =
completing all
> outstanding i/o requests that it had.
> It should either return success or error for every request, it can not =
simply
> drop a request.
> And that appears to be what happened here.

> It looks like ZFS is fragile in the face of this type of errors.
> Essentially, each logical i/o request obtains a configuration lock of =
type 'zio'
> in shared mode to prevent certain configuration changes from happening =
while
> there are any outsanding zio-s.
> If a zio is lost, then this lock is leaked.
> Then, the code that deals with vdev failures tries to take this lock =
in
> exclusive mode while holding a few other configuration locks also in =
exclsuive
> mode so, any other thread needing those locks would block.
> And there are code paths where a configuration lock is taken while
> spa_namespace_lock is held.
> And when spa_namespace_lock is never dropped then the system is close =
to toast,
> because all pool lookups would get stuck.
> I don't see how this can be fixed in ZFS.

> It seems that when the initiator is being removed it doesn't properly =
terminate
> in-glight requests.
> It would be interesting to see what happens if you test other =
scenarios.

So I tested the following other scenarios :
1 - drop all iSCSI traffic using ipfw on the target
2 - ifdown the iSCSI NIC on the target
3 - ifdown the iSCSI NIC on the initiator
4 - stop ctld (on the target of course)

I tested all of them several times, 5 or 6 times each ?

I managed to kernel panic (!) 2 times.
First time in case 2.
Second time in case 4.
Not sure I would not have been able to panic in other test cases though.

Stack traces :
https://s1.postimg.org/2hfdpsvban/panic_case2.png
https://s1.postimg.org/2ac5ud9t0f/panic_case4.png

(kgdb) list *g_io_request+0x4a7
0xffffffff80a14dc7 is in g_io_request (/usr/src/sys/geom/geom_io.c:638).
633			g_bioq_unlock(&g_bio_run_down);
634			/* Pass it on down. */
635			if (first)
636				wakeup(&g_wait_down);
637		}
638	}
639=09
640	void
641	g_io_deliver(struct bio *bp, int error)
642	{

I had some kernel panics on the same servers a few months ago,
loosing iSCSI targets which were used in a gmirror with local disks.
gmirror should have continued to work flawlessly (as ZFS)
using local disks but the server crashed.

Stack traces :
https://s1.postimg.org/14v4sabhv3/panic_g_destroy1.png
https://s1.postimg.org/437evsk6rz/panic_g_destroy2.png
https://s1.postimg.org/8pt1whiy5b/panic_g_destroy3.png

(kgdb) list *g_destroy_consumer+0x53
0xffffffff80a18563 is in g_destroy_consumer (geom.h:369).
364			KASSERT(g_valid_obj(ptr) =3D=3D 0,
365			    ("g_free(%p) of live object, type %d", ptr,
366			    g_valid_obj(ptr)));
367		}
368	#endif
369		free(ptr, M_GEOM);
370	}
371=09
372	#define g_topology_lock() 					=
\
373		do {							=
\

> I think that all problems that you have seen are different sides of =
the same
> underlying issue.  It looks like iscsi does not properly depart from =
geom and
> leaves behind some dangling pointers...
>=20
> The panics you got today most likely occurred here:
> 	bp->bio_to->geom->start(bp);
>=20
> And the most likely reason is that bio_to points to a destroyed geom =
provider.
>=20
> I wonder if you'd be able to get into direct contact with a developer
> responsible for iscsi in FreeBSD.  I think that it is a relatively =
recent
> addition and it was under a FreeBSD Foundation project.  So, I'd =
expect that the
> developer should be responsive.

Feel free then to contact me if you need, so that we can go further on =
this !

Thank you very much for your help,

Ben