From owner-freebsd-fs@freebsd.org Fri Oct 6 10:09:00 2017 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 05F44E32C69; Fri, 6 Oct 2017 10:09:00 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wm0-x244.google.com (mail-wm0-x244.google.com [IPv6:2a00:1450:400c:c09::244]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9652C7C264; Fri, 6 Oct 2017 10:08:59 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wm0-x244.google.com with SMTP id q132so6924175wmd.2; Fri, 06 Oct 2017 03:08:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=UpHVYrJAyRFA/okHlePHWZbTVK8Y5iq4PgSPAvQf7wM=; b=UUODtDqWhtKIt7hgjHIh2lgw6PDjBGRRdbp9v6rQTAyXIIyTfT835M7t+/hnIEgtaN +6V0Ar/APAR/lFMDJyg9obpXsd1YnG/6ziy3DDqbpWvrfzB6/Urpt4eWX8A76zuagaKG Cuxd4YbUMOLtQLjCX2fJ4akMUL3nJgVxFMVTcbFlh0bdK71UFhu4N2f/rWfNfhccw2kw kKN9RvNeZjckqztmAIKn3e5OBC94EL3/qzRuGT2u70STsuXw3qAtYM3fauo3phN9dFSy op5aF7UIXxL7ZrgkqpFW6sntmmvzjZj9VLIPnZDmjtdoBv8wF54ySoRzGTspjhkMQZ2e XjIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=UpHVYrJAyRFA/okHlePHWZbTVK8Y5iq4PgSPAvQf7wM=; b=R9LfkjmUmIEO/WjPJMGIe4Wk7B+iej5fKJIVnGp83tEZpdrANBWbf8tWOOcA4zPY9B hkUoAzLJta+NaVLRws7Iv9AHjIDrz3vic2tkyNHTqqJnJesR6J7aIYxMlo7dGjrftbO8 AYxumqSmXgf9lghskqPIxDeihmAgBL8U5qkI/xrmsuFx5rpPuDb7nLEdVQmBqGMjU0Ck LOtSKpdrOFf7mrZfS9a46eU9SWrKq0QBmt7QyMbwWQsDpry27l0GJA98k6vTIa26yklu oVT0xAAyLfEJszNCsQFNnCawsSYa8AiKK5j/y2G893o+WB1TAFdYZBC3db+MBkEMIrCZ 5OiQ== X-Gm-Message-State: AMCzsaVImHJhkq59FyGwsq/6rlndaqM9Sw6srdAdvyvXOBnVphNOwwAv K2YeOHctD4CxfO7obCBZnLhNolMq X-Google-Smtp-Source: AOwi7QDrbUNE/rsqNrs4O7gWR+HaVkzwREPc1nFG0/NchrPl3oW79X6VUfI4ae6tT+deecQzd4TLvA== X-Received: by 10.28.153.85 with SMTP id b82mr1125513wme.121.1507284537762; Fri, 06 Oct 2017 03:08:57 -0700 (PDT) Received: from bens-mac.home (LFbn-MAR-1-445-220.w2-15.abo.wanadoo.fr. [2.15.38.220]) by smtp.gmail.com with ESMTPSA id d17sm985661wrc.13.2017.10.06.03.08.56 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 06 Oct 2017 03:08:57 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: ZFS stalled after some mirror disks were lost From: Ben RUBSON In-Reply-To: Date: Fri, 6 Oct 2017 12:08:55 +0200 Cc: Freebsd fs Content-Transfer-Encoding: quoted-printable Message-Id: <82632887-E9D4-42D0-AC05-3764ABAC6B86@gmail.com> References: <4A0E9EB8-57EA-4E76-9D7E-3E344B2037D2@gmail.com> To: FreeBSD-scsi , =?utf-8?Q?Edward_Tomasz_Napiera=C5=82a?= X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Oct 2017 10:09:00 -0000 > On 02 Oct 2017, at 20:12, Ben RUBSON wrote: >=20 > Hi, >=20 > On a FreeBSD 11 server, the following online/healthy zpool : >=20 > home > mirror-0 > label/local1 > label/local2 > label/iscsi1 > label/iscsi2 > mirror-1 > label/local3 > label/local4 > label/iscsi3 > label/iscsi4 > cache > label/local5 > label/local6 >=20 > A sustained read throughput of 180 MB/s, 45 MB/s on each iscsi disk > according to "zpool iostat", nothing on local disks. > No write IOs. >=20 > Let's disconnect all iSCSI disks : > iscsictl -Ra >=20 > Expected behavior : > IO activity flawlessly continue on local disks. >=20 > What happened : > All IOs stalled, server only answers to IOs made to its zroot pool. > All commands related to the iSCSI disks (iscsictl), or to ZFS = (zfs/zpool), > don't return. >=20 > Questions : > Why this behavior ? > How to know what happens ? (/var/log/messages says almost nothing) >=20 > I already disconnected the iSCSI disks without any issue in the past, > several times, but there were almost no IOs running. >=20 > Thank you for your help ! >=20 > Ben Hello, So first, many thanks again to Andriy, we spent almost 3 hours debugging = the stalled server to find the root cause of the issue. Sounds like I would need help from iSCSI dev team (Edward perhaps ?), as = issue seems to be on this side. Here is Andriy conclusion after the debug session, I quote him : > So, it seems that the root cause of all evil is this outstanding zio = (it might > be not the only one). > In other words, it looks like iscsi stack bailed out without = completing all > outstanding i/o requests that it had. > It should either return success or error for every request, it can not = simply > drop a request. > And that appears to be what happened here. > It looks like ZFS is fragile in the face of this type of errors. > Essentially, each logical i/o request obtains a configuration lock of = type 'zio' > in shared mode to prevent certain configuration changes from happening = while > there are any outsanding zio-s. > If a zio is lost, then this lock is leaked. > Then, the code that deals with vdev failures tries to take this lock = in > exclusive mode while holding a few other configuration locks also in = exclsuive > mode so, any other thread needing those locks would block. > And there are code paths where a configuration lock is taken while > spa_namespace_lock is held. > And when spa_namespace_lock is never dropped then the system is close = to toast, > because all pool lookups would get stuck. > I don't see how this can be fixed in ZFS. > It seems that when the initiator is being removed it doesn't properly = terminate > in-glight requests. > It would be interesting to see what happens if you test other = scenarios. So I tested the following other scenarios : 1 - drop all iSCSI traffic using ipfw on the target 2 - ifdown the iSCSI NIC on the target 3 - ifdown the iSCSI NIC on the initiator 4 - stop ctld (on the target of course) I tested all of them several times, 5 or 6 times each ? I managed to kernel panic (!) 2 times. First time in case 2. Second time in case 4. Not sure I would not have been able to panic in other test cases though. Stack traces : https://s1.postimg.org/2hfdpsvban/panic_case2.png https://s1.postimg.org/2ac5ud9t0f/panic_case4.png (kgdb) list *g_io_request+0x4a7 0xffffffff80a14dc7 is in g_io_request (/usr/src/sys/geom/geom_io.c:638). 633 g_bioq_unlock(&g_bio_run_down); 634 /* Pass it on down. */ 635 if (first) 636 wakeup(&g_wait_down); 637 } 638 } 639=09 640 void 641 g_io_deliver(struct bio *bp, int error) 642 { I had some kernel panics on the same servers a few months ago, loosing iSCSI targets which were used in a gmirror with local disks. gmirror should have continued to work flawlessly (as ZFS) using local disks but the server crashed. Stack traces : https://s1.postimg.org/14v4sabhv3/panic_g_destroy1.png https://s1.postimg.org/437evsk6rz/panic_g_destroy2.png https://s1.postimg.org/8pt1whiy5b/panic_g_destroy3.png (kgdb) list *g_destroy_consumer+0x53 0xffffffff80a18563 is in g_destroy_consumer (geom.h:369). 364 KASSERT(g_valid_obj(ptr) =3D=3D 0, 365 ("g_free(%p) of live object, type %d", ptr, 366 g_valid_obj(ptr))); 367 } 368 #endif 369 free(ptr, M_GEOM); 370 } 371=09 372 #define g_topology_lock() = \ 373 do { = \ > I think that all problems that you have seen are different sides of = the same > underlying issue. It looks like iscsi does not properly depart from = geom and > leaves behind some dangling pointers... >=20 > The panics you got today most likely occurred here: > bp->bio_to->geom->start(bp); >=20 > And the most likely reason is that bio_to points to a destroyed geom = provider. >=20 > I wonder if you'd be able to get into direct contact with a developer > responsible for iscsi in FreeBSD. I think that it is a relatively = recent > addition and it was under a FreeBSD Foundation project. So, I'd = expect that the > developer should be responsive. Feel free then to contact me if you need, so that we can go further on = this ! Thank you very much for your help, Ben