Date: Tue, 9 Apr 2019 15:27:57 -0500 From: Karl Denninger <karl@denninger.net> To: freebsd-stable@freebsd.org Subject: Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20) Message-ID: <9a96b1b5-9337-fcae-1a2a-69d7bb24a5b3@denninger.net> In-Reply-To: <c96e31ad-6731-332e-5d2d-7be4889716e1@FreeBSD.org> References: <f87f32f2-b8c5-75d3-4105-856d9f4752ef@denninger.net> <c96e31ad-6731-332e-5d2d-7be4889716e1@FreeBSD.org>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --]
On 4/9/2019 15:04, Andriy Gapon wrote:
> On 09/04/2019 22:01, Karl Denninger wrote:
>> the resilver JUST COMPLETED with no errors which means the ENTIRE DISK'S
>> IN USE AREA was examined, compared, and blocks not on the "new member"
>> or changed copied over.
> I think that that's not entirely correct.
> ZFS maintains something called DTL, a dirty-time log, for a missing / offlined /
> removed device. When the device re-appears and gets resilvered, ZFS walks only
> those blocks that were born within the TXG range(s) when the device was missing.
>
> In any case, I do not have an explanation for what you are seeing.
That implies something much more-serious could be wrong such as given
enough time -- a week, say -- that the DTL marker is incorrect and some
TXGs that were in fact changed since the OFFLINE are not walked through
and synchronized. That would explain why it gets caught by a scrub --
the resilver is in fact not actually copying all the blocks that got
changed and so when you scrub the blocks are not identical. Assuming
the detached disk is consistent that's not catastrophically bad IF
CAUGHT; where you'd get screwed HARD is in the situation where (for
example) you had a 2-unit mirror, detached one, re-attached it, resilver
says all is well, there is no scrub performed and then the
*non-detached* disk fails before there is a scrub. In that case you
will have permanently destroyed or corrupted data since the other disk
is allegedly consistent but there are blocks *missing* that were never
copied over.
Again this just showed up on 12.x; it definitely was *not* at issue in
11.1 at all. I never ran 11.2 in production for a material amount of
time (I went from 11.1 to 12.0 STABLE after the IPv6 fixes were posted
to 12.x) so I don't know if it is in play on 11.2 or not.
I'll see if it shows up again with 20.00.07.00 card firmware.
Of note I cannot reproduce this on my test box with EITHER 19.00.00.00
or 20.00.07.00 firmware when I set up a 3-unit mirror, offline one, make
a crap-ton of changes, offline the second and reattach the third (in
effect mirroring the "take one to the vault" thing) with a couple of
hours elapsed time and a synthetic (e.g. "dd if=/dev/random of=outfile
bs=1m" sort of thing) "make me some new data that has to be resilvered"
workload. I don't know if that's because I need more entropy in the
filesystem than I can reasonably generate this way (e.g. more
fragmentation of files, etc) or whether it's a time-based issue (e.g.
something's wrong with the DTL/TXG thing as you note above in terms of
how it functions and it only happens if the time elapsed causes
something to be subject to a rollover or similar problem.)
I spent quite a lot of time trying to make reproduce the issue on my
"sandbox" machine and was unable -- and of note it is never a large
quantity of data that is impacted, it's usually only a couple of dozen
checksums that show as bad and fixed. Of note it's also never just one;
if there was a single random hit on a data block due to ordinary bitrot
sort of issues I'd expect only one checksum to be bad. But generating a
realistic synthetic workload over the amount of time involved on a
sandbox is not trivial at all; the system on which this is now happening
handles a lot of email and routine processing of various sorts including
a fair bit of database activity associated with network monitoring and
statistical analysis.
I'm assuming that using "offline" as a means to do this hasn't become
"invalid" as something that's considered "ok" as a means of doing this
sort of thing.... it certainly has worked perfectly well for a very long
time!
--
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/
[-- Attachment #2 --]
0 *H
010
`He 0 *H
00 H^Ōc!5
H0
*H
010 UUS10UFlorida10U Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA0
170817164217Z
270815164217Z0{10 UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA0"0
*H
0
h-5B>[;olӴ0~͎O9}9Ye*$g!ukvʶLzN`jL>MD'7U 45CB+kY`bd~b*c3Ny-78ju]9HeuέsӬDؽmgwER?&UURj'}9nWD i`XcbGz \gG=u%\Oi13ߝ4
K44pYQr]Ie/r0+eEޝݖ0C15Mݚ@JSZ(zȏ NTa(25DD5.l<g[[ZarQQ%Buȴ~~`IohRbʳڟu2MS8EdFUClCMaѳ !}ș+2k/bųE,n当ꖛ\(8WV8 d]b yXw ܊:I39
00U]^§Q\ӎ0U#0T039N0b010 UUS10UFlorida10U Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA @Ui0U0 0U0
*H
:P U!>vJnio-#ן]WyujǑR̀Q
nƇ!GѦFg\yLxgw=OPycehf[}ܷ['4ڝ\[p 6\o.B&JF"ZC{;*o*mcCcLY߾`
t*S!(`]DHP5A~/NPp6=mhk밣'doA$86hm5ӚS@jެEgl
)0JG`%k35PaC?σ
׳HEt}!P㏏%*BxbQwaKG$6h¦Mve;[o-Iی&
I,Tcߎ#t wPA@l0P+KXBպT zGv;NcI3&JĬUPNa?/%W6G۟N000 k#Xd\=0
*H
0{10 UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA0
170817212120Z
220816212120Z0W10 UUS10UFlorida10U
Cuda Systems LLC10Ukarl@denninger.net0"0
*H
0
T[I-ΆϏ dn;Å@שy.us~_ZG%<MYd\gvfnsa1'6Egyjs"C [{~_K Pn+<*pv#Q+H/7[-vqDV^U>f%GX)H.|l`M(Cr>е͇6#odc"YljҦln8@5SA0&ۖ"OGj?UDWZ5 dDB7k-)9Izs-JAv
J6L$Ն1SmY.Lqw*SH;EF'DĦH]MOgQQ|Mٙג2Z9y@y]}6ٽeY9Y2xˆ$T=eCǺǵbn֛{j|@LLt1[Dk5:$= ` M 00<+00.0,+0 http://ocsp.cudasystems.net:88880 U0 0 `HB0U0U%0++03 `HB
&$OpenSSL Generated Client Certificate0U%՞V=;bzQ0U#0]^§Q\ӎϡ010 UUS10UFlorida10U Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA H^Ōc!5
H0U0karl@denninger.net0
*H
۠A0-j%--$%g2#ޡ1^>{K+uGEv1ş7Af&b&O;.;A5*U)ND2bF|\=]<sˋL!wrw٧>YMÄ3\mWR hSv!_zvl? 3_ xU%\^#O*Gk̍YI_&Fꊛ@&1n } ͬ:{hTP3B.;bU8:Z=^Gw8!k-@xE@i,+'Iᐚ:fhztX7/(hY` O.1}a`%RW^akǂpCAufgDix UTЩ/7}%=jnVZvcF<M=
2^GKH5魉
_O4ެByʈySkw=5@h.0z>
W1000{10 UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA k#Xd\=0
`He E0 *H
1 *H
0 *H
1
190409202757Z0O *H
1B@8\νoӹQŘ(__of'[Qn/6ATP9eSXuTb0l *H
1_0]0 `He*0 `He0
*H
0*H
0
*H
@0+0
*H
(0 +7100{10 UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA k#Xd\=0*H
10{10 UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA k#Xd\=0
*H
JÞl+eZ}[K
Cg@'WZWVs+3~ ~+ȭ!J^KdG:X4O=a&nz&~ $Ǧ(4^kn0̒p'rVR|Y+_ѥ Uk.^Vܥxc!dkP= JbEJ3it`0Ti3_!)~:!f ReH#ٮV\!sO]sLO)Bf|n{E9>)+~x'v0aG=3ͬvZ/p͌4rA+lb
.~s:]U^U='?xI^ڠ+Z0d,DԐZnU[}XVЪə^o9atG+=U*\YB=0ϜhPO
<Qe%n<Ч1gsFBj]0^-fljnmѷ=BBRc
help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9a96b1b5-9337-fcae-1a2a-69d7bb24a5b3>
