Date: Fri, 6 Oct 2017 12:08:55 +0200 From: Ben RUBSON <ben.rubson@gmail.com> To: FreeBSD-scsi <freebsd-scsi@freebsd.org>, =?utf-8?Q?Edward_Tomasz_Napiera=C5=82a?= <trasz@FreeBSD.org> Cc: Freebsd fs <freebsd-fs@freebsd.org> Subject: Re: ZFS stalled after some mirror disks were lost Message-ID: <82632887-E9D4-42D0-AC05-3764ABAC6B86@gmail.com> In-Reply-To: <DDCFAC80-2D72-4364-85B2-7F4D7D70BCEE@gmail.com> References: <4A0E9EB8-57EA-4E76-9D7E-3E344B2037D2@gmail.com> <DDCFAC80-2D72-4364-85B2-7F4D7D70BCEE@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 02 Oct 2017, at 20:12, Ben RUBSON <ben.rubson@gmail.com> wrote: >=20 > Hi, >=20 > On a FreeBSD 11 server, the following online/healthy zpool : >=20 > home > mirror-0 > label/local1 > label/local2 > label/iscsi1 > label/iscsi2 > mirror-1 > label/local3 > label/local4 > label/iscsi3 > label/iscsi4 > cache > label/local5 > label/local6 >=20 > A sustained read throughput of 180 MB/s, 45 MB/s on each iscsi disk > according to "zpool iostat", nothing on local disks. > No write IOs. >=20 > Let's disconnect all iSCSI disks : > iscsictl -Ra >=20 > Expected behavior : > IO activity flawlessly continue on local disks. >=20 > What happened : > All IOs stalled, server only answers to IOs made to its zroot pool. > All commands related to the iSCSI disks (iscsictl), or to ZFS = (zfs/zpool), > don't return. >=20 > Questions : > Why this behavior ? > How to know what happens ? (/var/log/messages says almost nothing) >=20 > I already disconnected the iSCSI disks without any issue in the past, > several times, but there were almost no IOs running. >=20 > Thank you for your help ! >=20 > Ben Hello, So first, many thanks again to Andriy, we spent almost 3 hours debugging = the stalled server to find the root cause of the issue. Sounds like I would need help from iSCSI dev team (Edward perhaps ?), as = issue seems to be on this side. Here is Andriy conclusion after the debug session, I quote him : > So, it seems that the root cause of all evil is this outstanding zio = (it might > be not the only one). > In other words, it looks like iscsi stack bailed out without = completing all > outstanding i/o requests that it had. > It should either return success or error for every request, it can not = simply > drop a request. > And that appears to be what happened here. > It looks like ZFS is fragile in the face of this type of errors. > Essentially, each logical i/o request obtains a configuration lock of = type 'zio' > in shared mode to prevent certain configuration changes from happening = while > there are any outsanding zio-s. > If a zio is lost, then this lock is leaked. > Then, the code that deals with vdev failures tries to take this lock = in > exclusive mode while holding a few other configuration locks also in = exclsuive > mode so, any other thread needing those locks would block. > And there are code paths where a configuration lock is taken while > spa_namespace_lock is held. > And when spa_namespace_lock is never dropped then the system is close = to toast, > because all pool lookups would get stuck. > I don't see how this can be fixed in ZFS. > It seems that when the initiator is being removed it doesn't properly = terminate > in-glight requests. > It would be interesting to see what happens if you test other = scenarios. So I tested the following other scenarios : 1 - drop all iSCSI traffic using ipfw on the target 2 - ifdown the iSCSI NIC on the target 3 - ifdown the iSCSI NIC on the initiator 4 - stop ctld (on the target of course) I tested all of them several times, 5 or 6 times each ? I managed to kernel panic (!) 2 times. First time in case 2. Second time in case 4. Not sure I would not have been able to panic in other test cases though. Stack traces : https://s1.postimg.org/2hfdpsvban/panic_case2.png https://s1.postimg.org/2ac5ud9t0f/panic_case4.png (kgdb) list *g_io_request+0x4a7 0xffffffff80a14dc7 is in g_io_request (/usr/src/sys/geom/geom_io.c:638). 633 g_bioq_unlock(&g_bio_run_down); 634 /* Pass it on down. */ 635 if (first) 636 wakeup(&g_wait_down); 637 } 638 } 639=09 640 void 641 g_io_deliver(struct bio *bp, int error) 642 { I had some kernel panics on the same servers a few months ago, loosing iSCSI targets which were used in a gmirror with local disks. gmirror should have continued to work flawlessly (as ZFS) using local disks but the server crashed. Stack traces : https://s1.postimg.org/14v4sabhv3/panic_g_destroy1.png https://s1.postimg.org/437evsk6rz/panic_g_destroy2.png https://s1.postimg.org/8pt1whiy5b/panic_g_destroy3.png (kgdb) list *g_destroy_consumer+0x53 0xffffffff80a18563 is in g_destroy_consumer (geom.h:369). 364 KASSERT(g_valid_obj(ptr) =3D=3D 0, 365 ("g_free(%p) of live object, type %d", ptr, 366 g_valid_obj(ptr))); 367 } 368 #endif 369 free(ptr, M_GEOM); 370 } 371=09 372 #define g_topology_lock() = \ 373 do { = \ > I think that all problems that you have seen are different sides of = the same > underlying issue. It looks like iscsi does not properly depart from = geom and > leaves behind some dangling pointers... >=20 > The panics you got today most likely occurred here: > bp->bio_to->geom->start(bp); >=20 > And the most likely reason is that bio_to points to a destroyed geom = provider. >=20 > I wonder if you'd be able to get into direct contact with a developer > responsible for iscsi in FreeBSD. I think that it is a relatively = recent > addition and it was under a FreeBSD Foundation project. So, I'd = expect that the > developer should be responsive. Feel free then to contact me if you need, so that we can go further on = this ! Thank you very much for your help, Ben
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?82632887-E9D4-42D0-AC05-3764ABAC6B86>