Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Apr 2019 08:11:29 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-stable@freebsd.org
Subject:   Re: ZFS...
Message-ID:  <d0118f7e-7cfc-8bf1-308c-823bce088039@denninger.net>
In-Reply-To: <5ED8BADE-7B2C-4B73-93BC-70739911C5E3@sorbs.net>
References:  <30506b3d-64fb-b327-94ae-d9da522f3a48@sorbs.net> <CAOtMX2gf3AZr1-QOX_6yYQoqE-H%2B8MjOWc=eK1tcwt5M3dCzdw@mail.gmail.com> <56833732-2945-4BD3-95A6-7AF55AB87674@sorbs.net> <3d0f6436-f3d7-6fee-ed81-a24d44223f2f@netfence.it> <17B373DA-4AFC-4D25-B776-0D0DED98B320@sorbs.net> <70fac2fe3f23f85dd442d93ffea368e1@ultra-secure.de> <70C87D93-D1F9-458E-9723-19F9777E6F12@sorbs.net> <CAGMYy3tYqvrKgk2c==WTwrH03uTN1xQifPRNxXccMsRE1spaRA@mail.gmail.com> <5ED8BADE-7B2C-4B73-93BC-70739911C5E3@sorbs.net>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
On 4/30/2019 05:14, Michelle Sullivan wrote:
>> On 30 Apr 2019, at 19:50, Xin LI <delphij@gmail.com> wrote:
>>> On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan <michelle@sorbs.net> wrote:
>>> but in my recent experience 2 issues colliding at the same time results in disaster
>> Do we know exactly what kind of corruption happen to your pool?  If you see it twice in a row, it might suggest a software bug that should be investigated.
>>
>> All I know is it’s a checksum error on a meta slab (122) and from what I can gather it’s the spacemap that is corrupt... but I am no expert.  I don’t believe it’s a software fault as such, because this was cause by a hard outage (damaged UPSes) whilst resilvering a single (but completely failed) drive.  ...and after the first outage a second occurred (same as the first but more damaging to the power hardware)... the host itself was not damaged nor were the drives or controller.
.....
>> Note that ZFS stores multiple copies of its essential metadata, and in my experience with my old, consumer grade crappy hardware (non-ECC RAM, with several faulty, single hard drive pool: bad enough to crash almost monthly and damages my data from time to time),
> This was a top end consumer grade mb with non ecc ram that had been running for 8+ years without fault (except for hard drive platter failures.). Uptime would have been years if it wasn’t for patching.

Yuck.

I'm sorry, but that may well be what nailed you.

ECC is not just about the random cosmic ray.  It also saves your bacon
when there are power glitches.

Unfortunately however there is also cache memory on most modern hard
drives, most of the time (unless you explicitly shut it off) it's on for
write caching, and it'll nail you too.  Oh, and it's never, in my
experience, ECC.

In addition, however, and this is something I learned a LONG time ago
(think Z-80 processors!) is that as in so many very important things
"two is one and one is none."

In other words without a backup you WILL lose data eventually, and it
WILL be important.

Raidz2 is very nice, but as the name implies it you have two
redundancies.  If you take three errors, or if, God forbid, you *write*
a block that has a bad checksum in it because it got scrambled while in
RAM, you're dead if that happens in the wrong place.

> Yeah.. unlike UFS that has to get really really hosed to restore from backup with nothing recoverable it seems ZFS can get hosed where issues occur in just the wrong bit... but mostly it is recoverable (and my experience has been some nasty shit that always ended up being recoverable.)
>
> Michelle 

Oh that is definitely NOT true.... again, from hard experience,
including (but not limited to) on FreeBSD.

My experience is that ZFS is materially more-resilient but there is no
such thing as "can never be corrupted by any set of events."  Backup
strategies for moderately large (e.g. many Terabytes) to very large
(e.g. Petabytes and beyond) get quite complex but they're also very
necessary.

-- 
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

[-- Attachment #2 --]
0	*H
010
	`He0	*H

00H^Ōc!5
H0
	*H
010	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA0
170817164217Z
270815164217Z0{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA0"0
	*H
0
h-5B>[;olӴ0~͎O9}9Ye*$g!ukvʶLzN`jL>MD'7U45CB+kY`bd~b*c3Ny-78ju]9HeuέsӬDؽmgwER?&UURj'}9nWD i`XcbGz\gG=u%\Oi13ߝ4
K44pYQr]Ie/r0+eEޝݖ0C15Mݚ@JSZ(zȏNTa(25DD5.l<g[[ZarQQ%Buȴ~~`IohRbʳڟu2MS8EdFUClCMaѳ!}ș+2k/bųE,n当ꖛ\(8WV8	d]b	yXw	܊:I39
00U]^§Q\ӎ0U#0T039N0b010	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA	@Ui0U00U0
	*H
:P U!>vJnio-#ן]WyujǑR̀Q
nƇ!GѦFg\yLxgw=OPycehf[}ܷ['4ڝ\[p6\o.B&JF"ZC{;*o*mcCcLY߾`
t*S!񫶭(`]DHP5A~/NPp6=mhk밣'doA$86hm5ӚS@jެEgl
)0JG`%k35PaC?σ
׳HEt}!P㏏%*BxbQwaKG$6h¦Mve;[o-Iی&
I,Tcߎ#t wPA@l0P+KXBպT	zGv;NcI3&JĬUPNa?/%W6G۟N000k#Xd\=0
	*H
0{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA0
170817212120Z
220816212120Z0W10	UUS10UFlorida10U
Cuda Systems LLC10Ukarl@denninger.net0"0
	*H
0
T[I-ΆϏdn;Å@שy.us~_ZG%<MYd\gvfnsa1'6Egyjs"C [{~_KPn+<*pv#Q+H/7[-vqDV^U>f%GX)H.|l`M(Cr>е͇6#odc"YljҦln8@5SA0&ۖ"OGj?UDWZ5	dDB7k-)9Izs-JAv
J6L$Ն1SmY.Lqw*SH;EF'DĦH]MOgQQ|Mٙג2Z9y@y]}6ٽeY9Y2xˆ$T=eCǺǵbn֛{j|@LLt1[Dk5:$=	`	M00<+00.0,+0 http://ocsp.cudasystems.net:88880	U00	`HB0U0U%0++03	`HB
&$OpenSSL Generated Client Certificate0U%՞V=؁;bzQ0U#0]^§Q\ӎϡ010	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CAH^Ōc!5
H0U0karl@denninger.net0
	*H
۠A0-j%--$%g2#ޡ1^>{K+uGEv1ş7Af&b&O;.;A5*U)ND2bF|\=]<sˋL!wrw٧>YMÄ3\mWR hSv!_zvl? 3_ xU%\^#O*Gk̍YI_&Fꊛ@&1n”} ͬ:{hTP3B.;bU8:Z=^Gw8!k-@xE@i,+'Iᐚ:fhztX7/(hY` O.1}a`%RW^akǂpCAufgDixUTЩ/7}%=jnVZvcF<M=
2^GKH5魉
_O4ެByʈySkw=5@h.0z>
W1000{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CAk#Xd\=0
	`HeE0	*H
	1	*H
0	*H
	1
190430131129Z0O	*H
	1B@/(s|A!?^g;9%B[h$S|w[yBKj]L}w0l	*H
	1_0]0	`He*0	`He0
*H
0*H
0
*H
@0+0
*H
(0	+7100{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CAk#Xd\=0*H
	10{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CAk#Xd\=0
	*H
)ֹh1 %6y|)Usg~
xϠs-OT
3RG$zsBk̠H]+mG~Z$~֯&,}f9;B35UmNm%ZAggWatBhj/u2ҠJQAHVVM	}M^x-dN?z/x|Z;{Fê734@{
n6)X	?ݘPzJz)y-{N(؇]̺JZZa7,4d
+%^IfnE7}Piu.2Ubv#G.]d$l#cass]JHwX}uk^ae͖+bx]X0lVH1UYN0罙mշR"*r*hZ; ؆Դ]	Gl8)t=t.x݇K	šuTHy2B1gJ!I[
help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d0118f7e-7cfc-8bf1-308c-823bce088039>