From owner-freebsd-scsi@freebsd.org Mon Jul 3 13:36:45 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 12D7F9E6FBB; Mon, 3 Jul 2017 13:36:45 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wm0-x232.google.com (mail-wm0-x232.google.com [IPv6:2a00:1450:400c:c09::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8E44D7093A; Mon, 3 Jul 2017 13:36:44 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wm0-x232.google.com with SMTP id z75so38180794wmc.0; Mon, 03 Jul 2017 06:36:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:message-id:mime-version:subject:date:references:to:in-reply-to; bh=2PaOtJ9R5xG6KsuRbyJNwSqCDonjEu/CttKecN2VrjU=; b=kfPk5l5yRHVOXpocQafMYOxs9tnJckT6c6SstP7pVYjYjaE5Ud6MS64FoYwT/OQO/c ttoRfbhn6yLEWR0Ihnwp6aJSmRBtZ6Q+Ii6v859rzpChc+Gp8rp1Ur3OYygh5HURBMtn ekBFMKO5lAtzL6eE61FONMev2MS4dr8D202MM3C2KIYhbASzY+hZdLlcyS5aZnS6mL4v bA62mX9jHOPSnXka126FXrDMwTsSpYfesuIZ/CvALBF3TSVjvSuNxOR9mLQnA5jtFA2f OnThdTwcr39zWBNlKLTflguD87jDGaRLy+Wnlxphd0fcAvaywU7zxCnU+sZlqgRlL5lB W3Jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :references:to:in-reply-to; bh=2PaOtJ9R5xG6KsuRbyJNwSqCDonjEu/CttKecN2VrjU=; b=oC7n1+tTI3rhHvKMaV5l3LJJ/2YjRvquU1YBfru/4mOiSglMRi05ysQgu3uZwNi2f8 Fe5OyOjokOMDALZzTt9kc0l99YnvEJOOR6kLlJWRBIDuO+pZ21py8JKJjQS3kHUJ7byz 4gRB6Ii5nu0iXopFj8cITRZ/o7+ciP4rxzsltVr2dmQcm595vj48IOU8t30IFMutGAH3 +d9xLwp0Z5EM5Xb3kYdMzjY7XGNsuuFHR/uI2fT1GZjM2NSBZ47G/F7ltM5eL/zry2A2 26s0x6fZvd4AbBYau3m7D0gYt1SgNuI5poQV+7VFtSEQh2HbouZlOXhRXS2RVTPB636Z /nmA== X-Gm-Message-State: AKS2vOx04UjB2hU8KkN8G3d4WHoAed8WfXeg0P0IrVDuffyaoc68wa97 P3Q/qvZbEv3y9D94vNc= X-Received: by 10.28.10.194 with SMTP id 185mr25393286wmk.119.1499089002503; Mon, 03 Jul 2017 06:36:42 -0700 (PDT) Received: from ben.home (LFbn-1-11339-180.w2-15.abo.wanadoo.fr. [2.15.165.180]) by smtp.gmail.com with ESMTPSA id 13sm13619182wrl.57.2017.07.03.06.36.41 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 03 Jul 2017 06:36:41 -0700 (PDT) From: Ben RUBSON Message-Id: <1F414ECE-1856-4EA3-A141-88B64703D4D6@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: I/O to pool appears to be hung, panic ! Date: Mon, 3 Jul 2017 15:36:40 +0200 References: <20170629144334.1e283570@fabiankeil.de> To: Freebsd fs , freebsd-scsi In-Reply-To: X-Mailer: Apple Mail (2.3124) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Jul 2017 13:36:45 -0000 > On 03 Jul 2017, at 13:10, Edward Napierala wrote: >=20 > 2017-07-03 10:07 GMT+01:00 Ben RUBSON >: >=20 > > On 29 Jun 2017, at 15:36, Ben RUBSON > wrote: > > > >> On 29 Jun 2017, at 14:43, Fabian Keil > wrote: > > > > Thank you for your feedback Fabian. > > > >> Ben RUBSON > = wrote: > >> > >>> One of my servers did a kernel panic last night, giving the = following message : > >>> panic: I/O to pool 'home' appears to be hung on vdev guid 122... = at '/dev/label/G23iscsi'. > >> [...] > >>> Here are some numbers regarding this disk, taken from the server = hosting the pool : > >>> (unfortunately not from the iscsi target server) > >>> https://s23.postimg.org/zd8jy9xaj/busydisk.png = > >>> > >>> We clearly see that suddendly, disk became 100% busy, meanwhile = CPU was almost idle. >=20 > We also clearly see that 5 minutes later (02:09) disk seems to be back = but became 100% busy again, > and that 16 minutes later (default vfs.zfs.deadman_synctime_ms), panic = occurred. >=20 > >>> No error message at all on both servers. > >> [...] > >>> The only log I have is the following stacktrace taken from the = server console : > >>> panic: I/O to pool 'home' appears to be hung on vdev guid 122... = at '/dev/label/G23iscsi'. > >>> cpuid =3D 0 > >>> KDB: stack backtrace: > >>> #0 0xffffffff80b240f7 at kdb_backtrace+0x67 > >>> #1 0xffffffff80ad9462 at vpanic+0x182 > >>> #2 0xffffffff80ad92d3 at panic+0x43 > >>> #3 0xffffffff82238fa7 at vdev_deadman+0x127 > >>> #4 0xffffffff82238ec0 at vdev_deadman+0x40 > >>> #5 0xffffffff82238ec0 at vdev_deadman+0x40 > >>> #6 0xffffffff8222d0a6 at spa_deadman+0x86 > >>> #7 0xffffffff80af32da at softclock_call_cc+0x18a > >>> #8 0xffffffff80af3854 at softclock+0x94 > >>> #9 0xffffffff80a9348f at intr_event_execute_handlers+0x20f > >>> #10 0xffffffff80a936f6 at ithread_loop+0xc6 > >>> #11 0xffffffff80a900d5 at fork_exit+0x85 > >>> #12 0xffffffff80f846fe at fork_trampoline+0xe > >>> Uptime: 92d2h47m6s > >>> > >>> I would have been pleased to make a dump available. > >>> However, despite my (correct ?) configuration, server did not dump = : > >>> (nevertheless, "sysctl debug.kdb.panic=3D1" make it to dump) > >>> # grep ^dump /boot/loader.conf /etc/rc.conf > >>> /boot/loader.conf:dumpdev=3D"/dev/mirror/swap" > >>> /etc/rc.conf:dumpdev=3D"AUTO" > >> > >> You may want to look at the NOTES section in gmirror(8). > > > > Yes, I should already be OK (prefer algorithm set). > > > >>> I use default kernel, with a rebuilt zfs module : > >>> # uname -v > >>> FreeBSD 11.0-RELEASE-p8 #0: Wed Feb 22 06:12:04 UTC 2017 = root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC > >>> > >>> I use the following iSCSI configuration, which disconnects the = disks "as soon as" they are unavailable : > >>> kern.iscsi.ping_timeout=3D5 > >>> kern.iscsi.fail_on_disconnection=3D1 > >>> kern.iscsi.iscsid_timeout=3D5 > >>> > >>> I then think disk was at least correctly reachable during these 20 = busy minutes. > >>> > >>> So, any idea why I could have faced this issue ? > >> > >> Is it possible that the system was under memory pressure? > > > > No I don't think it was : > > https://s1.postimg.org/uvsebpyyn/busydisk2.png = > > More than 2GB of available memory. > > Swap not used (624kB). > > ARC behaviour seems correct (anon increases because ZFS can't = actually write I think). > > Regarding the pool itself, it was receiving data at 6MB/s, sending = around 30kB blocks to disks. > > When disk went busy, throughput fell to some kB, with 128kB blocks. > > > >> geli's use of malloc() is known to cause deadlocks under memory = pressure: > >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209759 = > >> > >> Given that gmirror uses malloc() as well it probably has the same = issue. > > > > I don't use geli so I should not face this issue. > > > >>> I would have thought ZFS would have taken the busy device offline, = instead of raising a panic. > >>> Perhaps it is already possible to make ZFS behave like this ? > >> > >> There's a tunable for this: vfs.zfs.deadman_enabled. > >> If the panic is just a symptom of the deadlock it's unlikely > >> to help though. > > > > I think this tunable should have prevented the server from having = raised a panic : > > # sysctl -d vfs.zfs.deadman_enabled > > vfs.zfs.deadman_enabled: Kernel panic on stalled ZFS I/O > > # sysctl vfs.zfs.deadman_enabled > > vfs.zfs.deadman_enabled: 1 > > > > But not sure how it would have behaved then... > > (busy disk miraculously back to normal status, memory pressure due = to anon increasing...) >=20 > I then think it would be nice, once vfs.zfs.deadman_synctime_ms has = expired, > to be able to take the busy device offline instead of raising a panic. > Currently, disabling deadman will avoid the panic but will let the = device slowing down the pool. >=20 > I still did not found the root cause of this issue, not sure I will, > quite difficult actually with a stacktrace and some performance graphs = only :/ >=20 > What exactly is the disk doing when that happens? What does "gstat" = say? If the iSCSI > target is also FreeBSD, what does ctlstat say? As shown on this graph made with gstat numbers from initiator : https://s23.postimg.org/zd8jy9xaj/busydisk.png The disk is continuously writing 3 MBps before the issue happens. When it occurs, response time increases to around 30 seconds (100% = busy), and consequently disk throughput drops down to some kBps. CPU stays at an almost fully idle level. As shown here, no memory pressure : https://s1.postimg.org/uvsebpyyn/busydisk2.png = At the end of graphs' lines, panic is raised. iSCSI target is also FreeBSD, unfortunately ctlstat was not running = during the issue occurred. So numbers will be average since system startup (102 days ago). I also do not have gstat numbers from this disk on target side (to help finding if it's a hardware issue, a iSCSI issue or something = else). I will think about collecting these numbers if ever issue occurs again. > If everything else fails, you might want to do a packet trace using = "tcpdump -w" and take > a look at it using Wireshark; it decodes iSCSI quite nicely.