Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 25 Apr 2016 01:00:45 -0400
From:      "Michael B. Eichorn" <ike@michaeleichorn.com>
To:        freebsd-fs@freebsd.org
Subject:   GELI + Zpool Scrub Results in GELI Device Destruction (and Later a Corrupt Pool)
Message-ID:  <1461560445.22294.53.camel@michaeleichorn.com>

index | next in thread | raw e-mail

[-- Attachment #1 --]
I just ran into something rather unexpected. I have a pool consisting
of a mirrored pair of geli encrypted partitions on WD Red 3TB disks.

The machine is running 10.3-RELEASE, the root zpool was setup with GELI
encryption from the installer, the pool that is acting up was setup per
the handbook.

See the below timeline for what happened, tldr: zpool scrub destroyed
the eli devices, my attempt to recreate the eli device earned me a
ZFS-8000-8A critical error (corrupted data).

All of the errors reported with zpool status -v are metadata and not
regualar files, but as I now have permanent metadata errors I am
looking for guidance as to:

1) Is it safe to keep running the pool as-is for a day or two or am I
risking data corruption?

2) It would be much much faster to copy the data to another pool than
recreate the pool and copy the data back, rather than restore from
backups, am I looking at any potential data loss if I do this?

3) What infomation would be useful to generate for the PR, the error is
reproducable so what should be tried before I nuke the pool?

Thanks,
Ike

-- TIMELINE --

I had just noticed that I had failed to enable the zpool scrub periodic
on this machine. So I began to run zpool scrub by hand. It succeeded
for the root pool which is also geli encrypted, but when I ran it
against my primary data pool I encountered:

Apr 24 23:18:23 terra kernel: GEOM_ELI: Device ada3p1.eli destroyed.
Apr 24 23:18:23 terra kernel: GEOM_ELI: Detached ada3p1.eli on last
close.
Apr 24 23:18:23 terra kernel: GEOM_ELI: Device ada2p1.eli destroyed.
Apr 24 23:18:23 terra kernel: GEOM_ELI: Detached ada2p1.eli on last
close.

And the scrub failed to initialize (command never returned to the
shell).

I then performed a reboot, which suceeded and brought everything up as
normal. I then attempted to scrub the pool again. This time I only lost
one of the partitions:

Apr 24 23:37:34 terra kernel: GEOM_ELI: Device ada2p1.eli destroyed.
Apr 24 23:37:34 terra kernel: GEOM_ELI: Detached ada2p1.eli on last
close.

I then performed a geli attach and zpool online, which onlined the disk
that was offline and offlined the disk that was online (EEEK!):

Apr 24 23:38:28 terra kernel: GEOM_ELI: Device ada2p1.eli created.
Apr 24 23:38:28 terra kernel: GEOM_ELI: Encryption: AES-XTS 256
Apr 24 23:38:28 terra kernel: GEOM_ELI:     Crypto: hardware
Apr 24 23:41:05 terra kernel: GEOM_ELI: Device ada3p1.eli destroyed.
Apr 24 23:41:05 terra kernel: GEOM_ELI: Detached ada3p1.eli on last
close.
Apr 24 23:41:05 terra devd: Executing 'logger -p kern.notice -t ZFS
'vdev state changed, pool_guid=5890893416839487107
vdev_guid=17504861086892353515''
Apr 24 23:41:05 terra ZFS: vdev state changed,
pool_guid=5890893416839487107 vdev_guid=17504861086892353515

I immediately rebooted and both disks came back and resilvered, with
permanent metadata errors

-- END TIMELINE --
[-- Attachment #2 --]
0	*H
010
	`He0	*H
000]0
	*H
010	UIL10U

StartCom Ltd.1+0)U"Secure Digital Certificate Signing1806U/StartCom Class 1 Primary Intermediate Client CA0
150613202446Z
160614003550Z0H10Uike@michaeleichorn.com1%0#	*H
	ike@michaeleichorn.com0"0
	*H
0
UՀ,k9D %Z|Y6J<rrK
g;&|uNlUE9)V.[ט̊:qS](#vSYDz*CpugYݔ,v<`j(waS#ڒ6n(K5'KVLåErv<J=[}W
bLA%gޭnVb|	I?M7D:$׃bM_T[,ƃ\00	U00U0U%0++0Ujj:	γ+39啖0U#0Sr풜\|~5NԸQ0!U0ike@michaeleichorn.com0LU C0?0;+70*0.+"http://www.startssl.com/policy.pdf0+00' StartCom Certification Authority0This certificate was issued according to the Class 1 Validation requirements of the StartCom CA policy, reliance only for the intended purpose in compliance of the relying party obligations.06U/0-0+)'%http://crl.startssl.com/crtu1-crl.crl0+009+0-http://ocsp.startssl.com/sub/class1/client/ca0B+06http://aia.startssl.com/certs/sub.class1.client.ca.crt0#U0http://www.startssl.com/0
	*H
x+ȐF}pw.XvF?rg
P]EOp)L˻yA
;hi0u2]m [Sbp$_
gr
Xm*YP3#H>mKAǠt)HO|=@}3ӝ'iO81>03	v'h5U
"H;ECZtpҗ4rWHu^6+i*kJL8shAV|5;?HMc\	j[j|+000]0
	*H
010	UIL10U

StartCom Ltd.1+0)U"Secure Digital Certificate Signing1806U/StartCom Class 1 Primary Intermediate Client CA0
150613202446Z
160614003550Z0H10Uike@michaeleichorn.com1%0#	*H
	ike@michaeleichorn.com0"0
	*H
0
UՀ,k9D %Z|Y6J<rrK
g;&|uNlUE9)V.[ט̊:qS](#vSYDz*CpugYݔ,v<`j(waS#ڒ6n(K5'KVLåErv<J=[}W
bLA%gޭnVb|	I?M7D:$׃bM_T[,ƃ\00	U00U0U%0++0Ujj:	γ+39啖0U#0Sr풜\|~5NԸQ0!U0ike@michaeleichorn.com0LU C0?0;+70*0.+"http://www.startssl.com/policy.pdf0+00' StartCom Certification Authority0This certificate was issued according to the Class 1 Validation requirements of the StartCom CA policy, reliance only for the intended purpose in compliance of the relying party obligations.06U/0-0+)'%http://crl.startssl.com/crtu1-crl.crl0+009+0-http://ocsp.startssl.com/sub/class1/client/ca0B+06http://aia.startssl.com/certs/sub.class1.client.ca.crt0#U0http://www.startssl.com/0
	*H
x+ȐF}pw.XvF?rg
P]EOp)L˻yA
;hi0u2]m [Sbp$_
gr
Xm*YP3#H>mKAǠt)HO|=@}3ӝ'iO81>03	v'h5U
"H;ECZtpҗ4rWHu^6+i*kJL8shAV|5;?HMc\	j[j|+0400
	*H
0}10	UIL10U

StartCom Ltd.1+0)U"Secure Digital Certificate Signing1)0'U StartCom Certification Authority0
071024210155Z
171024210155Z010	UIL10U

StartCom Ltd.1+0)U"Secure Digital Certificate Signing1806U/StartCom Class 1 Primary Intermediate Client CA0"0
	*H
0
	-).2AUGo#G
B|NDRpM-B=o-we5JQpa>O.#._<V
[~**pz~3WG.ᘟMlr[<Ce6fqO"uxfWN#uicgkv$Lb%y`_{`xK'GN00U00U0USr풜\|~5NԸQ0U#0N@[i04hCA0f+Z0X0'+0http://ocsp.startssl.com/ca0-+0!http://www.startssl.com/sfsca.crt0[UT0R0'%#!http://www.startssl.com/sfsca.crl0'%#!http://crl.startssl.com/sfsca.crl0U y0w0u+70f0.+"http://www.startssl.com/policy.pdf04+(http://www.startssl.com/intermediate.pdf0
	*H

}x,\c^#wMq}>UK/^yX֏y	frMIŲB61ymQ󸟆ҨݬZ0&;@#13qۑ&	̢o	6r_;GO>*I(	74XS1r3)!LJy6Kotˆ#
_wSr
;B
ADp(fs䰷6%.W0J3:bC<8t X1<Cn=t==wST~\wkBf|15zUP)(IjVB!OfI=bb\4-*em/нSJm7N[]'@ڽD9Kr>R7/|o^I@ټ'Pa$ z9a'L)(
I}vcH]۸D*W}
m>Q|C.(,lQ10{0010	UIL10U

StartCom Ltd.1+0)U"Secure Digital Certificate Signing1806U/StartCom Class 1 Primary Intermediate Client CA]0
	`He0	*H
	1	*H
0	*H
	1
160425050045Z0/	*H
	1" ͑;
GB*\e[l^v
ld|0	+710010	UIL10U

StartCom Ltd.1+0)U"Secure Digital Certificate Signing1806U/StartCom Class 1 Primary Intermediate Client CA]0*H
	1010	UIL10U

StartCom Ltd.1+0)U"Secure Digital Certificate Signing1806U/StartCom Class 1 Primary Intermediate Client CA]0
	*H
cq1gfF*x̮+_c09fY'v
)qqA=gR@ߓOߠb__Dhv?v/i^%.\HW(nRmy9Nt86l]˫w˸'
f=U1L~@.YʪxQH~
B,$É\_P B%Mr0/Z6}й
@W
help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1461560445.22294.53.camel>