From owner-freebsd-fs@freebsd.org Mon Apr 25 08:08:05 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CE71BB11055 for ; Mon, 25 Apr 2016 08:08:05 +0000 (UTC) (envelope-from maciej@suszko.eu) Received: from archeo.suszko.eu (archeo.unixguru.pl [IPv6:2001:41d0:2:8316::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 81B721BA1 for ; Mon, 25 Apr 2016 08:08:05 +0000 (UTC) (envelope-from maciej@suszko.eu) Received: from archeo (localhost [127.0.0.1]) by archeo.suszko.eu (Postfix) with ESMTP id E1C73D877; Mon, 25 Apr 2016 10:08:02 +0200 (CEST) X-Virus-Scanned: amavisd-new at archeo.local Received: from archeo.suszko.eu ([127.0.0.1]) by archeo (archeo.local [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id MgPOy5eWXIVR; Mon, 25 Apr 2016 10:08:02 +0200 (CEST) Received: from helium (gate.grtech.pl [195.8.99.234]) by archeo.suszko.eu (Postfix) with ESMTPSA id 55AFFD86F; Mon, 25 Apr 2016 10:08:02 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=suszko.eu; s=dkim; t=1461571682; bh=OyYt0UoCh0yjfC7u1JbpS2uzbZ0tDerCPzATfsEw2Sw=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=VveY4szsB+03gjLX6WbV5JuM3erh6N7yVuz5wjHP7kVYVE3RWR0F01UWy+6We5SmT 0BdNA6HkEJwwcCUvw2dU6FRR4PrY3KMOKAycufAO4rZv+Iwb1oXkfYwLfuiRxZJjBa mZt0ThkO53Tbv5TCSnDvEfYuWwrt7rLxH1K/g6g0= Date: Mon, 25 Apr 2016 10:07:54 +0200 From: Maciej Suszko To: "Michael B. Eichorn" Cc: freebsd-fs@freebsd.org Subject: Re: GELI + Zpool Scrub Results in GELI Device Destruction (and Later a Corrupt Pool) Message-ID: <20160425100754.0db9cd2b@helium> In-Reply-To: <1461560445.22294.53.camel@michaeleichorn.com> References: <1461560445.22294.53.camel@michaeleichorn.com> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.29; amd64-portbld-freebsd10.3) MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/mIKl+VjPI0/0u3AGJVrMF_f"; protocol="application/pgp-signature" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Apr 2016 08:08:05 -0000 --Sig_/mIKl+VjPI0/0u3AGJVrMF_f Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Mon, 25 Apr 2016 01:00:45 -0400 "Michael B. Eichorn" wrote: > I just ran into something rather unexpected. I have a pool consisting > of a mirrored pair of geli encrypted partitions on WD Red 3TB disks. >=20 > The machine is running 10.3-RELEASE, the root zpool was setup with > GELI encryption from the installer, the pool that is acting up was > setup per the handbook. >=20 > See the below timeline for what happened, tldr: zpool scrub destroyed > the eli devices, my attempt to recreate the eli device earned me a > ZFS-8000-8A critical error (corrupted data). >=20 > All of the errors reported with zpool status -v are metadata and not > regualar files, but as I now have permanent metadata errors I am > looking for guidance as to: >=20 > 1) Is it safe to keep running the pool as-is for a day or two or am I > risking data corruption? >=20 > 2) It would be much much faster to copy the data to another pool than > recreate the pool and copy the data back, rather than restore from > backups, am I looking at any potential data loss if I do this? >=20 > 3) What infomation would be useful to generate for the PR, the error > is reproducable so what should be tried before I nuke the pool? >=20 > Thanks, > Ike >=20 > -- TIMELINE -- >=20 > I had just noticed that I had failed to enable the zpool scrub > periodic on this machine. So I began to run zpool scrub by hand. It > succeeded for the root pool which is also geli encrypted, but when I > ran it against my primary data pool I encountered: >=20 > Apr 24 23:18:23 terra kernel: GEOM_ELI: Device ada3p1.eli destroyed. > Apr 24 23:18:23 terra kernel: GEOM_ELI: Detached ada3p1.eli on last > close. > Apr 24 23:18:23 terra kernel: GEOM_ELI: Device ada2p1.eli destroyed. > Apr 24 23:18:23 terra kernel: GEOM_ELI: Detached ada2p1.eli on last > close. >=20 > And the scrub failed to initialize (command never returned to the > shell). >=20 > I then performed a reboot, which suceeded and brought everything up as > normal. I then attempted to scrub the pool again. This time I only > lost one of the partitions: >=20 > Apr 24 23:37:34 terra kernel: GEOM_ELI: Device ada2p1.eli destroyed. > Apr 24 23:37:34 terra kernel: GEOM_ELI: Detached ada2p1.eli on last > close. >=20 > I then performed a geli attach and zpool online, which onlined the > disk that was offline and offlined the disk that was online (EEEK!): >=20 > Apr 24 23:38:28 terra kernel: GEOM_ELI: Device ada2p1.eli created. > Apr 24 23:38:28 terra kernel: GEOM_ELI: Encryption: AES-XTS 256 > Apr 24 23:38:28 terra kernel: GEOM_ELI:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0Cryp= to: hardware > Apr 24 23:41:05 terra kernel: GEOM_ELI: Device ada3p1.eli destroyed. > Apr 24 23:41:05 terra kernel: GEOM_ELI: Detached ada3p1.eli on last > close. > Apr 24 23:41:05 terra devd: Executing 'logger -p kern.notice -t ZFS > 'vdev state changed, pool_guid=3D5890893416839487107 > vdev_guid=3D17504861086892353515'' > Apr 24 23:41:05 terra ZFS: vdev state changed, > pool_guid=3D5890893416839487107 vdev_guid=3D17504861086892353515 >=20 > I immediately rebooted and both disks came back and resilvered, with > permanent metadata errors >=20 > -- END TIMELINE -- Hi, Configure your geli devices not to autodetach on last close... something like this in your rc.conf should work: geli_ada2p1_autodetach=3D"NO" geli_ada3p1_autodetach=3D"NO" --=20 regards, Maciej Suszko. --Sig_/mIKl+VjPI0/0u3AGJVrMF_f Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlcd0FsACgkQCikUk0l7iGr1BQCfV8P0qAceydOm3TV6USj1JsJ3 Sx0Anja9gq+xCxgBwW/kfW89etbMPeAX =3q/L -----END PGP SIGNATURE----- --Sig_/mIKl+VjPI0/0u3AGJVrMF_f--