Date: Sat, 1 Feb 2025 09:57:15 +0100 From: A FreeBSD User <freebsd@walstatt-de.de> To: Allan Jude <allanjude@freebsd.org> Cc: freebsd-current@freebsd.org Subject: Re: ZFS: Rescue FAULTED Pool Message-ID: <20250201095656.1bdfbe5f@thor.sb211.local> In-Reply-To: <980401eb-f8f6-44c7-8ee1-5ff0c9e1c35c@freebsd.org> References: <20250129112701.0c4a3236@freyja> <Z5oU1dLX4eQaN8Yq@albert.catwhisker.org> <20250130123354.2d767c7c@thor.sb211.local> <980401eb-f8f6-44c7-8ee1-5ff0c9e1c35c@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/pfQSL+v5QfLrNu5ora.cKlu Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Am Thu, 30 Jan 2025 16:13:56 -0500 Allan Jude <allanjude@freebsd.org> schrieb: > On 1/30/2025 6:35 AM, A FreeBSD User wrote: > > Am Wed, 29 Jan 2025 03:45:25 -0800 > > David Wolfskill <david@catwhisker.org> schrieb: > >=20 > > Hello, thanks for responding. > > =20 > >> On Wed, Jan 29, 2025 at 11:27:01AM +0100, FreeBSD User wrote: =20 > >>> Hello, > >>> > >>> a ZFS pool (RAINDZ(1)) has been faulted. The pool is not importable > >>> anymore. neither with import -F/-f. > >>> Although this pool is on an experimental system (no backup available) > >>> it contains some data to reconstruct them would take a while, so I'd > >>> like to ask whether there is a way to try to "de-fault" such a pool. = =20 > >> > >> Well, 'zpool clear ...' "Clears device errors in a pool." (from "man > >> zpool". > >> > >> It is, however, not magic -- it doesn't actually fix anything. =20 > >=20 > > For the record: I tried EVERY network/search available method useful f= or common > > "administrators", but hoped people are abe to manipulate deeper stuff v= ia zdb ... > > =20 > >> > >> (I had an issue with a zpool which had a single SSD device as a ZIL; t= he > >> ZIL device failed after it had accepted some data to be written to the > >> pool, but before the data could be read and transferred to the spinning > >> disks. ZFS was quite unhappy about that. I was eventually able to co= py > >> the data elsewhere, destroy the old zpool, recreate it *without* that > >> single point of failure, then copy the data back. And I learned to > >> never create a zpool with a *single* device as a separate ZIL.) =20 > >=20 > > Well, in this case I do not use dedicated ZIL drives. I also made sever= al experiences with > > "single" ZIL drive setups, but a dedicated ZIL is mostly useful in case= s were you have > > graveyard full of inertia-suffering, mass-spinning HDDs - if I'm right = the concept of SSD > > based ZIL would be of no use/effect in that case. So I ommited tose. > > =20 > >> =20 > >>> The pool is comprised from 7 drives as a RAIDZ1, one of the SSDs > >>> faulted but I pulled the wrong one, so the pool ran into suspended > >>> state. =20 > >> > >> Can you put the drive you pulled back in? =20 > >=20 > > Every single SSD originally plugged in is now back in place, even the f= aulted one (which > > doesn't report any faults at the moment). > >=20 > > Although the pool isn't "importable", zdb reports its existence, amongs= t zroot (which > > resides on a dedicated drive). > > =20 > >> =20 > >>> The host is running the lates Xigmanas BETA, which is effectively > >>> FreeBSD 14.1-p2, just for the record. > >>> > >>> I do not want to give up, since I hoped there might be a rude but > >>> effective way to restore the pool even under datalosses ... > >>> > >>> Thanks in advance, > >>> > >>> Oliver > >>> .... =20 > >> > >> Good luck! > >> > >> Peace, > >> david =20 > >=20 > >=20 > > Well, this is a hard and painful lecture to learn, if there is no chanc= e to get back the > > pool. > >=20 > > A warning (but this seems to be useless in the realm of professionals):= I used a bunch of > > cheap spotmarket SATA SSDs, a brand called "Intenso" common also here i= n Good old Germany. > > Some of those SSDs do have working LED when used with a Fujitsu SAS HBA= controller - but > > those died very quickly from suffering some bus errors. Another bunch o= f those SSDs do not > > have working LED (not blinking on access), but lasted a bit longer. The= problem with those > > SSDs is: I can not find the failing device easily by accessing the fail= ed drive by writing > > massive data via dd, if possible. > > I also ordered alternative SSDs from a more expensive brand - but bad K= arma ... > >=20 > > Oliver > >=20 > > =20 >=20 > The most useful thing to share right now would be the output of `zpool=20 > import` (with no pool name) on the rebooted system. >=20 > That will show where the issues are, and suggest how they might be solved. >=20 Hello, this exactly happens when trying to import the pool. Prior to the lo= ss, device da1p1 has been faulted with numbers in the colum/columns "corrupted data"/further= not seen now. ~# zpool import pool: BUNKER00 id: XXXXXXXXXXXXXXXXXXXX state: FAULTED status: The pool metadata is corrupted. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the '-f' flag. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-72 config: BUNKER00 FAULTED corrupted data raidz1-0 ONLINE da2p1 ONLINE da3p1 ONLINE da4p1 ONLINE da7p1 ONLINE da6p1 ONLINE da1p1 ONLINE da5p1 ONLINE ~# zpool import -f BUNKER00 cannot import 'BUNKER00': I/O error Destroy and re-create the pool from a backup source. ~# zpool import -F BUNKER00 cannot import 'BUNKER00': one or more devices is currently unavailable --=20 A FreeBSD user --Sig_/pfQSL+v5QfLrNu5ora.cKlu Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iHUEARYKAB0WIQRQheDybVktG5eW/1Kxzvs8OqokrwUCZ53iBgAKCRCxzvs8Oqok r6sjAP4/5+HPz0EImGkLPE9Z2D5y5YNwlAJIg9c/DZrUF4UJCAD/QXeV+iCOl8f/ /23e1CzJFzdY8+/ARqKmeno7BAa0/gU= =konu -----END PGP SIGNATURE----- --Sig_/pfQSL+v5QfLrNu5ora.cKlu--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20250201095656.1bdfbe5f>