Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Feb 2021 11:37:17 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-fs@freebsd.org
Subject:   Re: Reading a corrupted file on ZFS
Message-ID:  <2f82f113-9ca1-99a9-a433-89e3ae5edcbe@denninger.net>
In-Reply-To: <10977ffc-f806-69dd-0cef-d4fd4fc5f649@artem.ru>
References:  <da892eeb-233f-551f-2faa-62f42c3c1d5b@artem.ru> <0ca45adf-8f60-a4c3-6264-6122444a3ffd@denninger.net> <899c6b4f-2368-7ec2-4dfe-fa09fab35447@artem.ru> <20210212165216.2f613482@fabiankeil.de> <10977ffc-f806-69dd-0cef-d4fd4fc5f649@artem.ru>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
On 2/12/2021 11:22, Artem Kuchin wrote:
> 12.02.2021 18:52, Fabian Keil пишет:
>> Artem Kuchin <artem@artem.ru> wrote on 2021-02-12:
>>
>>> 12.02.2021 18:06, Karl Denninger пишет:
>>>> Blocking the read forces you to get the good copy off backup media and
>>>> thus prevents that from happening.
>>>>
>>> I know what ZFS does and i damaged the same file in the same place on
>>> purpose. Question is: how to read what's left of it. Just for kicks, i
>>> don't have a backup, and i need to read what's left. It could be 1GB
>>> file with only one byte damaged and it is of crazy importance to me. 
>>> So,
>>> how to bypass all the checks and make it read the file no matter what?
>> The patch from this PR adds a sysctl that allows to send corrupted data:
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221909
>>
>> Using the added sysctl you can send and receive the dataset and then
>> read the corrupted file from the received dataset. Note that ZFS 
>> replaces
>> corrupted blocks completely with the 0x'zfs badd bloc' pattern instead
>> of returning the corrupted data as is, thus increasing the amount of
>> corruption in case of simple bit flips to whole blocks.
>>
>> Fabian
>
> Arghh. That's not what i want. This is strange. In case of stupid old 
> FS like FAT or even newer UFS i can dig into damaged file and collect 
> as much data as possible, while newer ZFS does not provide tools to 
> dig into data. That's was always my concern about ZFS. If something 
> bad goes with FAT/NTFS and even UFS - there are tons of tools which 
> can dissect the file system into bits so i can get as much as possible 
> of what's left. In case of ZFS there are no tools that i know and even 
> ZFS itself does not allow to get what left of normal data.
>
> This is frustrating. why..why..

You created a synthetic situation that in the real world almost-never 
exists (ONE byte modified in all copies in the same allocation block but 
all other data in that block is intact and recoverable.)

In almost-all actual cases of "bit rot" it's exactly that; random and by 
statistics extraordinarily unlikely to hit all copies at once in the 
same allocation block.  Therefore, ZFS can and does fix it; UFS or FAT 
silently returns the corrupted data, propagates it, and eventually 
screws you down the road.

The nearly-every-case situation in the real world where a disk goes 
physically bad (I've had this happen *dozens* of times over my IT 
career) results in the drive being unable to return the block at all; 
you don't get all but the bad byte back, you get nothing for that block 
and any attempt to "touch" it results in either a hard error coming back 
with no data in the buffer or (if not a TLER device) a wildly-extended 
timeout before an I/O error is returned with, again, no usable data in 
the buffer.  On "old" winchester-style spinning media and even floppy 
drives this resulted in an entire physical sector (usually 512 bytes) 
being irretrievably lost.  In the case of a "modern" zoned or 
advanced-format hard drive or an SSD the amount of data impacted and 
unreadable is typically much larger than one sector; for an SDD it is 
frequently *at least* a 4k block (which can and frequently does span 
multiple files!) and for many instances of rotating rust it can be an 
entire *track* if the servo data is where the fault lies which can be a 
*huge* amount of data.

The patch gives you all but one allocation block of data from ZFS, with 
that one block effectively zeroed.  This is no worse than the usual 
actual (not your synthesized test) impact of such a failure in a the 
real world with other filesystems in virtually every instance where it 
happens "in the wild."

In short there are very, very few actual "in the wild" failures where 
one byte is damaged and the rest surrounding that one byte is intact and 
retrievable.  In most cases where an actual failure occurs the 
unreadable data constitutes *at least* a physical sector.

-- 
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

[-- Attachment #2 --]
0	*H
010
	`He0	*H

00H^Ōc!5
H0
	*H
010	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA0
170817164217Z
270815164217Z0{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA0"0
	*H
0
h-5B>[;olӴ0~͎O9}9Ye*$g!ukvʶLzN`jL>MD'7U45CB+kY`bd~b*c3Ny-78ju]9HeuέsӬDؽmgwER?&UURj'}9nWD i`XcbGz\gG=u%\Oi13ߝ4
K44pYQr]Ie/r0+eEޝݖ0C15Mݚ@JSZ(zȏNTa(25DD5.l<g[[ZarQQ%Buȴ~~`IohRbʳڟu2MS8EdFUClCMaѳ!}ș+2k/bųE,n当ꖛ\(8WV8	d]b	yXw	܊:I39
00U]^§Q\ӎ0U#0T039N0b010	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA	@Ui0U00U0
	*H
:P U!>vJnio-#ן]WyujǑR̀Q
nƇ!GѦFg\yLxgw=OPycehf[}ܷ['4ڝ\[p6\o.B&JF"ZC{;*o*mcCcLY߾`
t*S!񫶭(`]DHP5A~/NPp6=mhk밣'doA$86hm5ӚS@jެEgl
)0JG`%k35PaC?σ
׳HEt}!P㏏%*BxbQwaKG$6h¦Mve;[o-Iی&
I,Tcߎ#t wPA@l0P+KXBպT	zGv;NcI3&JĬUPNa?/%W6G۟N000k#Xd\=0
	*H
0{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA0
170817212120Z
220816212120Z0W10	UUS10UFlorida10U
Cuda Systems LLC10Ukarl@denninger.net0"0
	*H
0
T[I-ΆϏdn;Å@שy.us~_ZG%<MYd\gvfnsa1'6Egyjs"C [{~_KPn+<*pv#Q+H/7[-vqDV^U>f%GX)H.|l`M(Cr>е͇6#odc"YljҦln8@5SA0&ۖ"OGj?UDWZ5	dDB7k-)9Izs-JAv
J6L$Ն1SmY.Lqw*SH;EF'DĦH]MOgQQ|Mٙג2Z9y@y]}6ٽeY9Y2xˆ$T=eCǺǵbn֛{j|@LLt1[Dk5:$=	`	M00<+00.0,+0 http://ocsp.cudasystems.net:88880	U00	`HB0U0U%0++03	`HB
&$OpenSSL Generated Client Certificate0U%՞V=؁;bzQ0U#0]^§Q\ӎϡ010	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CAH^Ōc!5
H0U0karl@denninger.net0
	*H
۠A0-j%--$%g2#ޡ1^>{K+uGEv1ş7Af&b&O;.;A5*U)ND2bF|\=]<sˋL!wrw٧>YMÄ3\mWR hSv!_zvl? 3_ xU%\^#O*Gk̍YI_&Fꊛ@&1n”} ͬ:{hTP3B.;bU8:Z=^Gw8!k-@xE@i,+'Iᐚ:fhztX7/(hY` O.1}a`%RW^akǂpCAufgDixUTЩ/7}%=jnVZvcF<M=
2^GKH5魉
_O4ެByʈySkw=5@h.0z>
W1000{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CAk#Xd\=0
	`HeE0	*H
	1	*H
0	*H
	1
210212163717Z0O	*H
	1B@4>):Da"NDd00E߉rg	#ܥ&:Ͻ,0l	*H
	1_0]0	`He*0	`He0
*H
0*H
0
*H
@0+0
*H
(0	+7100{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CAk#Xd\=0*H
	10{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CAk#Xd\=0
	*H
p<' xS)Lc2MY(†PB3wڏ١q,VXxy 	Beo\ZiSyC*x5k?F.3iϟ=:OBh*G/eBD0SX΋x׺y5Ah4u6c2ZkEi="$Կ.`CI+˜0'$)r`t;QTxeIu%
{ߙ
E뉨GGӆ0LkDZ
(3o5C}2v3;ADJEN>ŧ;jy@GI}d<g1>>⥗Ao܊å2m,$!U<pxLN(HW?Oda38ne=MH$Y5݄1|ӄ(5NK^$|$K-U,Z]-C
KR:*71$O2
home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2f82f113-9ca1-99a9-a433-89e3ae5edcbe>