From owner-freebsd-fs@freebsd.org  Mon Jul  3 09:07:11 2017
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B98F29E1B47;
 Mon,  3 Jul 2017 09:07:11 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: from mail-wm0-x22f.google.com (mail-wm0-x22f.google.com
 [IPv6:2a00:1450:400c:c09::22f])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 4C23A642A0;
 Mon,  3 Jul 2017 09:07:11 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: by mail-wm0-x22f.google.com with SMTP id 62so162105234wmw.1;
 Mon, 03 Jul 2017 02:07:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=pHIxfnfO1kGjFbqmb23JJjYccYBWsUl9Y7g0WX6v6/s=;
 b=kvUMP3L93038P6ZUGTVt63XhSmAQZ111RDl2JkveNZTRIvifpW1ldTIKok9N+PRzmf
 6UtW5IeI+x8ezGlJCiNcsuhimJJnm5YyI0JluisZ/gx7OleRSPKY2PIUtpdi/Ib3t3Vp
 TYVxRNkP+hucTxedrLDup32TGAYg74chnRW0FB1yxk3VCqQEbs8upM2PZAvFK2L3sju/
 exJwR20L/pgVD+yUl1ZZs8OHAT+pjnZEp+I/Z+LGsaBBnnzNd7eM2drhP/rTIWohp2bX
 pqITY/jjs3rRpXDzOdmlOFHqwIw7aEjNc4Gn1HiXfFlD6cmIuefsHQPys4lWu93a7rL7
 5bYQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=pHIxfnfO1kGjFbqmb23JJjYccYBWsUl9Y7g0WX6v6/s=;
 b=TSiNIVb5ukyyuBsYLPZu0+0E/Z8lFUkhpaGU9NakFP4Au0RnHPErUln7stlAzZsLMY
 AlHbON4Vfd+mRWXA1BOaPjYeYc7NcWnV1qVfZ5NPXZ5oAZYvCY82VHuio9VdXiqzV/65
 Rw5A9rAetRQ2Q0XbfVTiGjyZfFh0c4vm+p2GhP1nNlKA0W9G1pLQWLEC4sXObTc9SVl5
 79CJV+OiiRJqby5EU2ketEzbHv2v/u8KMp7H56UcPBlFBk7G5l+Re4MpAhsxOwjT4lFj
 m8bfggTsAHYzEBQuWftzO0MmSu6IqdXtUHro5vkVsZBx9AkjhzqNfhj5JsiIlkPt07oz
 ropg==
X-Gm-Message-State: AKS2vOyZi6hFyEOMMlNxZ5AoSXQalWsyBsavVddnZwrAg0A95BTUU5rJ
 qhvH+hop9uWmn1Snljk=
X-Received: by 10.28.54.217 with SMTP id y86mr15164248wmh.81.1499072829298;
 Mon, 03 Jul 2017 02:07:09 -0700 (PDT)
Received: from ben.home (LFbn-1-11339-180.w2-15.abo.wanadoo.fr. [2.15.165.180])
 by smtp.gmail.com with ESMTPSA id c27sm13078689wrb.44.2017.07.03.02.07.08
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Mon, 03 Jul 2017 02:07:08 -0700 (PDT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: I/O to pool appears to be hung, panic !
From: Ben RUBSON <ben.rubson@gmail.com>
In-Reply-To: <A1CC7D73-4196-4503-9716-52E84AA24FD3@gmail.com>
Date: Mon, 3 Jul 2017 11:07:06 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <C584B1DF-AC6E-4E77-9497-3D0EED76EACF@gmail.com>
References: <E8CC223E-3F41-4036-84A9-FBA693AC2CAA@gmail.com>
 <20170629144334.1e283570@fabiankeil.de>
 <A1CC7D73-4196-4503-9716-52E84AA24FD3@gmail.com>
To: Freebsd fs <freebsd-fs@freebsd.org>,
 freebsd-scsi@freebsd.org
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Jul 2017 09:07:11 -0000

> On 29 Jun 2017, at 15:36, Ben RUBSON <ben.rubson@gmail.com> wrote:
>=20
>> On 29 Jun 2017, at 14:43, Fabian Keil <freebsd-listen@fabiankeil.de> =
wrote:
>=20
> Thank you for your feedback Fabian.
>=20
>> Ben RUBSON <ben.rubson@gmail.com> wrote:
>>=20
>>> One of my servers did a kernel panic last night, giving the =
following message :
>>> panic: I/O to pool 'home' appears to be hung on vdev guid 122... at =
'/dev/label/G23iscsi'.
>> [...]=20
>>> Here are some numbers regarding this disk, taken from the server =
hosting the pool :
>>> (unfortunately not from the iscsi target server)
>>> https://s23.postimg.org/zd8jy9xaj/busydisk.png
>>>=20
>>> We clearly see that suddendly, disk became 100% busy, meanwhile CPU =
was almost idle.

We also clearly see that 5 minutes later (02:09) disk seems to be back =
but became 100% busy again,
and that 16 minutes later (default vfs.zfs.deadman_synctime_ms), panic =
occurred.

>>> No error message at all on both servers.
>> [...]
>>> The only log I have is the following stacktrace taken from the =
server console :
>>> panic: I/O to pool 'home' appears to be hung on vdev guid 122... at =
'/dev/label/G23iscsi'.
>>> cpuid =3D 0
>>> KDB: stack backtrace:
>>> #0 0xffffffff80b240f7 at kdb_backtrace+0x67
>>> #1 0xffffffff80ad9462 at vpanic+0x182
>>> #2 0xffffffff80ad92d3 at panic+0x43
>>> #3 0xffffffff82238fa7 at vdev_deadman+0x127
>>> #4 0xffffffff82238ec0 at vdev_deadman+0x40
>>> #5 0xffffffff82238ec0 at vdev_deadman+0x40
>>> #6 0xffffffff8222d0a6 at spa_deadman+0x86
>>> #7 0xffffffff80af32da at softclock_call_cc+0x18a
>>> #8 0xffffffff80af3854 at softclock+0x94
>>> #9 0xffffffff80a9348f at intr_event_execute_handlers+0x20f
>>> #10 0xffffffff80a936f6 at ithread_loop+0xc6
>>> #11 0xffffffff80a900d5 at fork_exit+0x85
>>> #12 0xffffffff80f846fe at fork_trampoline+0xe
>>> Uptime: 92d2h47m6s
>>>=20
>>> I would have been pleased to make a dump available.
>>> However, despite my (correct ?) configuration, server did not dump :
>>> (nevertheless, "sysctl debug.kdb.panic=3D1" make it to dump)
>>> # grep ^dump /boot/loader.conf /etc/rc.conf
>>> /boot/loader.conf:dumpdev=3D"/dev/mirror/swap"
>>> /etc/rc.conf:dumpdev=3D"AUTO"
>>=20
>> You may want to look at the NOTES section in gmirror(8).
>=20
> Yes, I should already be OK (prefer algorithm set).
>=20
>>> I use default kernel, with a rebuilt zfs module :
>>> # uname -v
>>> FreeBSD 11.0-RELEASE-p8 #0: Wed Feb 22 06:12:04 UTC 2017     =
root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC=20
>>>=20
>>> I use the following iSCSI configuration, which disconnects the disks =
"as soon as" they are unavailable :
>>> kern.iscsi.ping_timeout=3D5
>>> kern.iscsi.fail_on_disconnection=3D1
>>> kern.iscsi.iscsid_timeout=3D5
>>>=20
>>> I then think disk was at least correctly reachable during these 20 =
busy minutes.
>>>=20
>>> So, any idea why I could have faced this issue ?
>>=20
>> Is it possible that the system was under memory pressure?
>=20
> No I don't think it was :
> https://s1.postimg.org/uvsebpyyn/busydisk2.png
> More than 2GB of available memory.
> Swap not used (624kB).
> ARC behaviour seems correct (anon increases because ZFS can't actually =
write I think).
> Regarding the pool itself, it was receiving data at 6MB/s, sending =
around 30kB blocks to disks.
> When disk went busy, throughput fell to some kB, with 128kB blocks.
>=20
>> geli's use of malloc() is known to cause deadlocks under memory =
pressure:
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209759
>>=20
>> Given that gmirror uses malloc() as well it probably has the same =
issue.
>=20
> I don't use geli so I should not face this issue.
>=20
>>> I would have thought ZFS would have taken the busy device offline, =
instead of raising a panic.
>>> Perhaps it is already possible to make ZFS behave like this ?
>>=20
>> There's a tunable for this: vfs.zfs.deadman_enabled.
>> If the panic is just a symptom of the deadlock it's unlikely
>> to help though.
>=20
> I think this tunable should have prevented the server from having =
raised a panic :
> # sysctl -d vfs.zfs.deadman_enabled
> vfs.zfs.deadman_enabled: Kernel panic on stalled ZFS I/O
> # sysctl vfs.zfs.deadman_enabled
> vfs.zfs.deadman_enabled: 1
>=20
> But not sure how it would have behaved then...
> (busy disk miraculously back to normal status, memory pressure due to =
anon increasing...)

I then think it would be nice, once vfs.zfs.deadman_synctime_ms has =
expired,
to be able to take the busy device offline instead of raising a panic.
Currently, disabling deadman will avoid the panic but will let the =
device slowing down the pool.

I still did not found the root cause of this issue, not sure I will,
quite difficult actually with a stacktrace and some performance graphs =
only :/

Thank you again,

Ben