From nobody Fri Nov 5 19:05:25 2021 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 9880C1846D9C for ; Fri, 5 Nov 2021 19:05:31 +0000 (UTC) (envelope-from cross+freebsd@distal.com) Received: from relay.wiredblade.com (relay.wiredblade.com [168.235.95.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Hm9232mVPz3pvH for ; Fri, 5 Nov 2021 19:05:31 +0000 (UTC) (envelope-from cross+freebsd@distal.com) Received: from mail.distal.com (pool-108-48-165-176.washdc.fios.verizon.net [108.48.165.176]) by relay.wiredblade.com with ESMTPSA (version=TLSv1.2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256) ; Fri, 5 Nov 2021 19:05:29 +0000 Received: from smtpclient.apple ( [2001:420:c0c4:1005::15]) by tristain.distal.com (OpenSMTPD) with ESMTPSA id ba53751d (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO) for ; Fri, 5 Nov 2021 15:05:28 -0400 (EDT) From: Chris Ross Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 15.0 \(3693.20.0.1.32\)) Subject: Re: ZFS operations hanging, but no visible errors? Date: Fri, 5 Nov 2021 15:05:25 -0400 References: <20211105173935.7aa53269@fabiankeil.de> <86999084-7007-4F08-A4C4-4A835A7E1C78@distal.com> <0AABEDF8-665F-465F-9792-0B0BE6CAE97F@distal.com> To: freebsd-fs In-Reply-To: <0AABEDF8-665F-465F-9792-0B0BE6CAE97F@distal.com> Message-Id: <33253C33-9A8D-403B-A7E9-C511EA4ED34A@distal.com> X-Mailer: Apple Mail (2.3693.20.0.1.32) X-Rspamd-Queue-Id: 4Hm9232mVPz3pvH X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; TAGGED_FROM(0.00)[freebsd] X-ThisMailContainsUnwantedMimeParts: N > On Nov 5, 2021, at 13:12, Chris Ross wrote: >=20 > Okay. Despite everything I had running being stuck I was able to log = into the console, and coincidentally or not, things have now recovered. = Well, the old commands/sessions didn=E2=80=99t, but I can log in again. = I can=E2=80=99t get to the tmux session it seems, but. >=20 > I=E2=80=99m able to run that sysctl, which has a lot of data. The = last records all about two hours ago are: >=20 > 1636125429 metaslab.c:2538:metaslab_unload(): metaslab_unload: txg = 1033689, spa tank, vdev_id 1, ms_id 854, weight 780000000000001, = selected txg 1033574 (601067 ms ago), alloc_txg 1033313, loaded 5902891 = ms ago, max_size 2147475456 > 1636125429 metaslab.c:2538:metaslab_unload(): metaslab_unload: txg = 1033689, spa tank, vdev_id 2, ms_id 88, weight 880000000000001, selected = txg 1033574 (601067 ms ago), alloc_txg 1020497, loaded 864138 ms ago, = max_size 17179869184 > 1636125429 metaslab.c:2538:metaslab_unload(): metaslab_unload: txg = 1033689, spa tank, vdev_id 1, ms_id 859, weight 780000000000001, = selected txg 1033574 (601067 ms ago), alloc_txg 1033029, loaded 2201252 = ms ago, max_size 2147475456 > 1636125429 metaslab.c:2538:metaslab_unload(): metaslab_unload: txg = 1033689, spa tank, vdev_id 1, ms_id 860, weight 780000000000001, = selected txg 1033574 (601067 ms ago), alloc_txg 1033229, loaded 3395548 = ms ago, max_size 2147303424 > 1636125429 metaslab.c:2538:metaslab_unload(): metaslab_unload: txg = 1033689, spa tank, vdev_id 1, ms_id 863, weight 7c0000000000001, = selected txg 1033574 (601067 ms ago), alloc_txg 1033448, loaded 4046753 = ms ago, max_size 4294926336 >=20 > Not sure if that helps=E2=80=A6. Okay. Following up just to close out the =E2=80=9Cactive=E2=80=9D state = of the issue. It became unresponsive again moments after the above. The kernel was functional, as I was able to switch to multiple virtual consoles, but logging in only yielded a =E2=80=9CLast login=E2=80=9D line, then = nothing else. C/R=E2=80=99s were echoed on consoles, but nothing else happened. I issued a Ctrl-Alt-Delete, and it began stopping things, failed the 90 second watchdog timer and noted terminating shutdown abnormally. The kernel did eventually report =E2=80=9CAll buffers synced.=E2=80=9D = then nothing else. After about 10 minutes, I tried Ctrl-Alt-Delete again, and then = power-cycled the box. I=E2=80=99d still be interested in hearing any theories about what = happened, but I no longer have the device in this state to test. - Chris=