Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 12 Dec 2021 09:45:06 -0700
From:      Alan Somers <asomers@freebsd.org>
To:        FreeBSD User <freebsd@walstatt-de.de>
Cc:        FreeBSD CURRENT <freebsd-current@freebsd.org>
Subject:   Re: CURRENT: ZFS freezes system beyond reboot
Message-ID:  <CAOtMX2hjGJHqvmFDarQ=bCen_hXtkeOkZj4KuFEuSfiWCsu17Q@mail.gmail.com>
In-Reply-To: <20211212102032.08af9689@jelly.fritz.box>
References:  <20211212102032.08af9689@jelly.fritz.box>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Dec 12, 2021 at 2:22 AM FreeBSD User <freebsd@walstatt-de.de> wrote:
>
> Running CURRENT (FreeBSD 14.0-CURRENT #52 main-n251260-156fbc64857: Thu
> Dec  2 14:45:55 CET 2021 amd64), out of the sudden the ZFS RAIDZ pool
> suffered from an error:
>
> Solaris: WARNING: Pool 'POOL00' has encountered an uncorrectable I/O
> failure and has been suspended.
>
> The system does not repsond anymore on that pool, transactions to and
> from that pool are frozen, the system is 99.9% idle.
> The most "not so funny" part is: the box doesn't even recognize a
> "shutdown -r now" or a brute force "reboot". I still can login via ssh,
> but any action regarding the ZFS pool freezes the console/terminal.
>
> ZFS very often renders the system unresponsible forever. How can this
> be mitigated? The system in question is on a remote site and it seems
> not only to be bound to CURRENT, we realised similar problems on
> 13-STABLE as well.
>
> What can I do to "unfreeze" the ZFS? The main OS is, luckily, on an
> UFS/FFS filesystem and so not affected from that problem.
>
> By the way, here some more details, as far as I can pick those up:
>
> zpool clear POOL00 cannot clear errors for POOL00: I/O error
>
> Whatever took out the ZFS pool (can not see any hardware errors, the
> pool is part of services and especially a poudriere build system and
> under heavy load all the time, the box has 16 GB RAM), it also renders
> the rest of the system unusable in a way which is beyond a "reboot".
>
> Kind regrads,
> oh

You need to look at what's causing those errors.  What kind of disks
are you using, with what HBA?  It's not surprising that any access to
ZFS hangs; that's what it's designed to do when a pool is suspended.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2hjGJHqvmFDarQ=bCen_hXtkeOkZj4KuFEuSfiWCsu17Q>