Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 25 Apr 2016 10:07:54 +0200
From:      Maciej Suszko <maciej@suszko.eu>
To:        "Michael B. Eichorn" <ike@michaeleichorn.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: GELI + Zpool Scrub Results in GELI Device Destruction (and Later a Corrupt Pool)
Message-ID:  <20160425100754.0db9cd2b@helium>
In-Reply-To: <1461560445.22294.53.camel@michaeleichorn.com>
References:  <1461560445.22294.53.camel@michaeleichorn.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/mIKl+VjPI0/0u3AGJVrMF_f
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Mon, 25 Apr 2016 01:00:45 -0400
"Michael B. Eichorn" <ike@michaeleichorn.com> wrote:

> I just ran into something rather unexpected. I have a pool consisting
> of a mirrored pair of geli encrypted partitions on WD Red 3TB disks.
>=20
> The machine is running 10.3-RELEASE, the root zpool was setup with
> GELI encryption from the installer, the pool that is acting up was
> setup per the handbook.
>=20
> See the below timeline for what happened, tldr: zpool scrub destroyed
> the eli devices, my attempt to recreate the eli device earned me a
> ZFS-8000-8A critical error (corrupted data).
>=20
> All of the errors reported with zpool status -v are metadata and not
> regualar files, but as I now have permanent metadata errors I am
> looking for guidance as to:
>=20
> 1) Is it safe to keep running the pool as-is for a day or two or am I
> risking data corruption?
>=20
> 2) It would be much much faster to copy the data to another pool than
> recreate the pool and copy the data back, rather than restore from
> backups, am I looking at any potential data loss if I do this?
>=20
> 3) What infomation would be useful to generate for the PR, the error
> is reproducable so what should be tried before I nuke the pool?
>=20
> Thanks,
> Ike
>=20
> -- TIMELINE --
>=20
> I had just noticed that I had failed to enable the zpool scrub
> periodic on this machine. So I began to run zpool scrub by hand. It
> succeeded for the root pool which is also geli encrypted, but when I
> ran it against my primary data pool I encountered:
>=20
> Apr 24 23:18:23 terra kernel: GEOM_ELI: Device ada3p1.eli destroyed.
> Apr 24 23:18:23 terra kernel: GEOM_ELI: Detached ada3p1.eli on last
> close.
> Apr 24 23:18:23 terra kernel: GEOM_ELI: Device ada2p1.eli destroyed.
> Apr 24 23:18:23 terra kernel: GEOM_ELI: Detached ada2p1.eli on last
> close.
>=20
> And the scrub failed to initialize (command never returned to the
> shell).
>=20
> I then performed a reboot, which suceeded and brought everything up as
> normal. I then attempted to scrub the pool again. This time I only
> lost one of the partitions:
>=20
> Apr 24 23:37:34 terra kernel: GEOM_ELI: Device ada2p1.eli destroyed.
> Apr 24 23:37:34 terra kernel: GEOM_ELI: Detached ada2p1.eli on last
> close.
>=20
> I then performed a geli attach and zpool online, which onlined the
> disk that was offline and offlined the disk that was online (EEEK!):
>=20
> Apr 24 23:38:28 terra kernel: GEOM_ELI: Device ada2p1.eli created.
> Apr 24 23:38:28 terra kernel: GEOM_ELI: Encryption: AES-XTS 256
> Apr 24 23:38:28 terra kernel: GEOM_ELI:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0Cryp=
to: hardware
> Apr 24 23:41:05 terra kernel: GEOM_ELI: Device ada3p1.eli destroyed.
> Apr 24 23:41:05 terra kernel: GEOM_ELI: Detached ada3p1.eli on last
> close.
> Apr 24 23:41:05 terra devd: Executing 'logger -p kern.notice -t ZFS
> 'vdev state changed, pool_guid=3D5890893416839487107
> vdev_guid=3D17504861086892353515''
> Apr 24 23:41:05 terra ZFS: vdev state changed,
> pool_guid=3D5890893416839487107 vdev_guid=3D17504861086892353515
>=20
> I immediately rebooted and both disks came back and resilvered, with
> permanent metadata errors
>=20
> -- END TIMELINE --

Hi,

Configure your geli devices not to autodetach on last close...
something like this in your rc.conf should work:

geli_ada2p1_autodetach=3D"NO"
geli_ada3p1_autodetach=3D"NO"
--=20
regards, Maciej Suszko.

--Sig_/mIKl+VjPI0/0u3AGJVrMF_f
Content-Type: application/pgp-signature
Content-Description: OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlcd0FsACgkQCikUk0l7iGr1BQCfV8P0qAceydOm3TV6USj1JsJ3
Sx0Anja9gq+xCxgBwW/kfW89etbMPeAX
=3q/L
-----END PGP SIGNATURE-----

--Sig_/mIKl+VjPI0/0u3AGJVrMF_f--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160425100754.0db9cd2b>