From owner-freebsd-fs@freebsd.org  Mon Jul  3 15:40:50 2017
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id DAD5C9E92B0;
 Mon,  3 Jul 2017 15:40:50 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: from mail-wm0-x229.google.com (mail-wm0-x229.google.com
 [IPv6:2a00:1450:400c:c09::229])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 5CB4D74D26;
 Mon,  3 Jul 2017 15:40:50 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: by mail-wm0-x229.google.com with SMTP id f67so59056430wmh.1;
 Mon, 03 Jul 2017 08:40:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=from:message-id:mime-version:subject:date:references:to:in-reply-to;
 bh=ovyVCvyhloMQ/jD0OLGq195wBHMNIKxKL28wsOoe6w0=;
 b=APPCfGF4zioKvH5abBgg8/dOUCpooS6J4C/jbBbnU9wk0xTzZriBf985lfPrdLH5GB
 R1eBbxa61k5No5/jaAA3b8Tt5KZ0NbXssmitLSikDCgqJ8KmQAuE81YYs2bp/4lLfJxq
 2Tb/M3FbHPBZm5+BpTmofd6Hw5qD6F4pDqWMGTRriTUQ2+1yPFxTjRzv9u9P9slYivjz
 MPIMhqkNwRiSTBqUKfxTnIcqPQ4VF7rlLoYFlVKO2RhTppiXvR/rRxP6QjYoYBjapq4C
 wXKw7+lAbWRZbPnzRa/E8Ahp55j5BTQE3QUCCj9JPeELMmoPCwaKG8KK1HrRV7WQ5RXs
 Sy3g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:message-id:mime-version:subject:date
 :references:to:in-reply-to;
 bh=ovyVCvyhloMQ/jD0OLGq195wBHMNIKxKL28wsOoe6w0=;
 b=mchFoACaKc7oSenWNngsEVH7rMPzDg77ZLQ7w2v00IpOyQwcQs4iAEdszbmrs5MdHs
 aSk+kwFX4fzLmSxWJDvLxPb+LuTkUPwrMCFgH3WbNd2LRtifhZvalZTDyKHzoA8UhFPk
 gvzcAa0qv8SrqlIPFBlIZp05vwoC/2/dqFC8f5rd6l8s5zvIyatnzocQoOi1HWJ6SFEF
 3y6pn1Vx3ZZ2kz88p8bi8nldzJ0VV7bayAiX5JIqBwD/ZOz4gTRhB/hAyQXJK5nJjZbf
 dyrPTO/RYiRxPkqkxvt5Eqor6YX1hSGvObjwMMeBhkMajc/D0e26A1VwVLyZ5q4w2Y5s
 V6Sw==
X-Gm-Message-State: AIVw113by322m85Njz0ur3EFHHtlRyX/EHzJBB2SsXerP5m0Me68PaCt
 t+htYiU76gfj9oiimxs=
X-Received: by 10.28.184.87 with SMTP id i84mr7233612wmf.22.1499096448410;
 Mon, 03 Jul 2017 08:40:48 -0700 (PDT)
Received: from ben.home (LFbn-1-11339-180.w2-15.abo.wanadoo.fr. [2.15.165.180])
 by smtp.gmail.com with ESMTPSA id n71sm18817841wrb.62.2017.07.03.08.40.47
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Mon, 03 Jul 2017 08:40:47 -0700 (PDT)
From: Ben RUBSON <ben.rubson@gmail.com>
Message-Id: <D9AE9CA6-D05C-4909-A56E-1CE6D149E71D@gmail.com>
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: I/O to pool appears to be hung, panic !
Date: Mon, 3 Jul 2017 17:40:46 +0200
References: <E8CC223E-3F41-4036-84A9-FBA693AC2CAA@gmail.com>
 <20170629144334.1e283570@fabiankeil.de>
 <A1CC7D73-4196-4503-9716-52E84AA24FD3@gmail.com>
 <C584B1DF-AC6E-4E77-9497-3D0EED76EACF@gmail.com>
 <CAFLM3-qpsGx=EYHxAaLsSzF22JAJx0zg8deJ3FX_ec5uDO=0Cw@mail.gmail.com>
 <1F414ECE-1856-4EA3-A141-88B64703D4D6@gmail.com>
 <CAFLM3-pzOMHmd4PVvZRxe6GnmdpH2-tTAQXjhw8MuU9Y1-oRxQ@mail.gmail.com>
To: Freebsd fs <freebsd-fs@freebsd.org>,
 freebsd-scsi <freebsd-scsi@freebsd.org>
In-Reply-To: <CAFLM3-pzOMHmd4PVvZRxe6GnmdpH2-tTAQXjhw8MuU9Y1-oRxQ@mail.gmail.com>
X-Mailer: Apple Mail (2.3124)
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.23
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Jul 2017 15:40:51 -0000

> On 03 Jul 2017, at 17:27, Edward Napierala <trasz@freebsd.org> wrote:
>=20
> 2017-07-03 14:36 GMT+01:00 Ben RUBSON <ben.rubson@gmail.com =
<mailto:ben.rubson@gmail.com>>:
> > On 03 Jul 2017, at 13:10, Edward Napierala <trasz@freebsd.org =
<mailto:trasz@freebsd.org>> wrote:
> >
> > 2017-07-03 10:07 GMT+01:00 Ben RUBSON <ben.rubson@gmail.com =
<mailto:ben.rubson@gmail.com> <mailto:ben.rubson@gmail.com =
<mailto:ben.rubson@gmail.com>>>:
> >
> > > On 29 Jun 2017, at 15:36, Ben RUBSON <ben.rubson@gmail.com =
<mailto:ben.rubson@gmail.com> <mailto:ben.rubson@gmail.com =
<mailto:ben.rubson@gmail.com>>> wrote:
> > >
> > >> On 29 Jun 2017, at 14:43, Fabian Keil =
<freebsd-listen@fabiankeil.de <mailto:freebsd-listen@fabiankeil.de> =
<mailto:freebsd-listen@fabiankeil.de =
<mailto:freebsd-listen@fabiankeil.de>>> wrote:
> > >
> > > Thank you for your feedback Fabian.
> > >
> > >> Ben RUBSON <ben.rubson@gmail.com <mailto:ben.rubson@gmail.com> =
<mailto:ben.rubson@gmail.com <mailto:ben.rubson@gmail.com>>> wrote:
> > >>
> > >>> One of my servers did a kernel panic last night, giving the =
following message :
> > >>> panic: I/O to pool 'home' appears to be hung on vdev guid 122... =
at '/dev/label/G23iscsi'.
> > >> [...]
> > >>> Here are some numbers regarding this disk, taken from the server =
hosting the pool :
> > >>> (unfortunately not from the iscsi target server)
> > >>> https://s23.postimg.org/zd8jy9xaj/busydisk.png =
<https://s23.postimg.org/zd8jy9xaj/busydisk.png> =
<https://s23.postimg.org/zd8jy9xaj/busydisk.png =
<https://s23.postimg.org/zd8jy9xaj/busydisk.png>>
> > >>>
> > >>> We clearly see that suddendly, disk became 100% busy, meanwhile =
CPU was almost idle.
> >
> > We also clearly see that 5 minutes later (02:09) disk seems to be =
back but became 100% busy again,
> > and that 16 minutes later (default vfs.zfs.deadman_synctime_ms), =
panic occurred.
> >
> > >>> No error message at all on both servers.
> > >> [...]
> > >>> The only log I have is the following stacktrace taken from the =
server console :
> > >>> panic: I/O to pool 'home' appears to be hung on vdev guid 122... =
at '/dev/label/G23iscsi'.
> > >>> cpuid =3D 0
> > >>> KDB: stack backtrace:
> > >>> #0 0xffffffff80b240f7 at kdb_backtrace+0x67
> > >>> #1 0xffffffff80ad9462 at vpanic+0x182
> > >>> #2 0xffffffff80ad92d3 at panic+0x43
> > >>> #3 0xffffffff82238fa7 at vdev_deadman+0x127
> > >>> #4 0xffffffff82238ec0 at vdev_deadman+0x40
> > >>> #5 0xffffffff82238ec0 at vdev_deadman+0x40
> > >>> #6 0xffffffff8222d0a6 at spa_deadman+0x86
> > >>> #7 0xffffffff80af32da at softclock_call_cc+0x18a
> > >>> #8 0xffffffff80af3854 at softclock+0x94
> > >>> #9 0xffffffff80a9348f at intr_event_execute_handlers+0x20f
> > >>> #10 0xffffffff80a936f6 at ithread_loop+0xc6
> > >>> #11 0xffffffff80a900d5 at fork_exit+0x85
> > >>> #12 0xffffffff80f846fe at fork_trampoline+0xe
> > >>> Uptime: 92d2h47m6s
> > >>>
> > >>> I would have been pleased to make a dump available.
> > >>> However, despite my (correct ?) configuration, server did not =
dump :
> > >>> (nevertheless, "sysctl debug.kdb.panic=3D1" make it to dump)
> > >>> # grep ^dump /boot/loader.conf /etc/rc.conf
> > >>> /boot/loader.conf:dumpdev=3D"/dev/mirror/swap"
> > >>> /etc/rc.conf:dumpdev=3D"AUTO"
> > >>
> > >> You may want to look at the NOTES section in gmirror(8).
> > >
> > > Yes, I should already be OK (prefer algorithm set).
> > >
> > >>> I use default kernel, with a rebuilt zfs module :
> > >>> # uname -v
> > >>> FreeBSD 11.0-RELEASE-p8 #0: Wed Feb 22 06:12:04 UTC 2017     =
root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
> > >>>
> > >>> I use the following iSCSI configuration, which disconnects the =
disks "as soon as" they are unavailable :
> > >>> kern.iscsi.ping_timeout=3D5
> > >>> kern.iscsi.fail_on_disconnection=3D1
> > >>> kern.iscsi.iscsid_timeout=3D5
> > >>>
> > >>> I then think disk was at least correctly reachable during these =
20 busy minutes.
> > >>>
> > >>> So, any idea why I could have faced this issue ?
> > >>
> > >> Is it possible that the system was under memory pressure?
> > >
> > > No I don't think it was :
> > > https://s1.postimg.org/uvsebpyyn/busydisk2.png =
<https://s1.postimg.org/uvsebpyyn/busydisk2.png> =
<https://s1.postimg.org/uvsebpyyn/busydisk2.png =
<https://s1.postimg.org/uvsebpyyn/busydisk2.png>>
> > > More than 2GB of available memory.
> > > Swap not used (624kB).
> > > ARC behaviour seems correct (anon increases because ZFS can't =
actually write I think).
> > > Regarding the pool itself, it was receiving data at 6MB/s, sending =
around 30kB blocks to disks.
> > > When disk went busy, throughput fell to some kB, with 128kB =
blocks.
> > >
> > >> geli's use of malloc() is known to cause deadlocks under memory =
pressure:
> > >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209759 =
<https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209759> =
<https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209759 =
<https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209759>>
> > >>
> > >> Given that gmirror uses malloc() as well it probably has the same =
issue.
> > >
> > > I don't use geli so I should not face this issue.
> > >
> > >>> I would have thought ZFS would have taken the busy device =
offline, instead of raising a panic.
> > >>> Perhaps it is already possible to make ZFS behave like this ?
> > >>
> > >> There's a tunable for this: vfs.zfs.deadman_enabled.
> > >> If the panic is just a symptom of the deadlock it's unlikely
> > >> to help though.
> > >
> > > I think this tunable should have prevented the server from having =
raised a panic :
> > > # sysctl -d vfs.zfs.deadman_enabled
> > > vfs.zfs.deadman_enabled: Kernel panic on stalled ZFS I/O
> > > # sysctl vfs.zfs.deadman_enabled
> > > vfs.zfs.deadman_enabled: 1
> > >
> > > But not sure how it would have behaved then...
> > > (busy disk miraculously back to normal status, memory pressure due =
to anon increasing...)
> >
> > I then think it would be nice, once vfs.zfs.deadman_synctime_ms has =
expired,
> > to be able to take the busy device offline instead of raising a =
panic.
> > Currently, disabling deadman will avoid the panic but will let the =
device slowing down the pool.
> >
> > I still did not found the root cause of this issue, not sure I will,
> > quite difficult actually with a stacktrace and some performance =
graphs only :/
> >
> > What exactly is the disk doing when that happens?  What does "gstat" =
say?  If the iSCSI
> > target is also FreeBSD, what does ctlstat say?
>=20
> As shown on this graph made with gstat numbers from initiator :
> https://s23.postimg.org/zd8jy9xaj/busydisk.png =
<https://s23.postimg.org/zd8jy9xaj/busydisk.png>
> The disk is continuously writing 3 MBps before the issue happens.
> When it occurs, response time increases to around 30 seconds (100% =
busy),
> and consequently disk throughput drops down to some kBps.
> CPU stays at an almost fully idle level.
>=20
> As shown here, no memory pressure :
> https://s1.postimg.org/uvsebpyyn/busydisk2.png =
<https://s1.postimg.org/uvsebpyyn/busydisk2.png> =
<https://s1.postimg.org/uvsebpyyn/busydisk2.png =
<https://s1.postimg.org/uvsebpyyn/busydisk2.png>>
>=20
> At the end of graphs' lines, panic is raised.
>=20
> iSCSI target is also FreeBSD, unfortunately ctlstat was not running =
during the issue occurred.
> So numbers will be average since system startup (102 days ago).
> I also do not have gstat numbers from this disk on target side
> (to help finding if it's a hardware issue, a iSCSI issue or something =
else).
> I will think about collecting these numbers if ever issue occurs =
again.
>=20
> It's kind of hard to say something definitive at this point, but I =
suspect it's a problem
> at the target side.  I got a report about something quite similar some =
two years ago,
> and it turned out to be a problem with a disk controller on the =
target.

Thank you for your feedback.
I then :
- enabled gstat collection on target, to also have numbers on target, =
not only on initiator ;
- enabled controller logging (dev.mps.0.debug_level=3D0x1B) ;
- disabled deadman.

We should be able to investigate further in case issue occurs again.

Of course feel free to notify me in case you have other ideas !

Thank you again,

Ben