Date: Fri, 12 Feb 2021 11:37:17 -0500 From: Karl Denninger <karl@denninger.net> To: freebsd-fs@freebsd.org Subject: Re: Reading a corrupted file on ZFS Message-ID: <2f82f113-9ca1-99a9-a433-89e3ae5edcbe@denninger.net> In-Reply-To: <10977ffc-f806-69dd-0cef-d4fd4fc5f649@artem.ru> References: <da892eeb-233f-551f-2faa-62f42c3c1d5b@artem.ru> <0ca45adf-8f60-a4c3-6264-6122444a3ffd@denninger.net> <899c6b4f-2368-7ec2-4dfe-fa09fab35447@artem.ru> <20210212165216.2f613482@fabiankeil.de> <10977ffc-f806-69dd-0cef-d4fd4fc5f649@artem.ru>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] On 2/12/2021 11:22, Artem Kuchin wrote: > 12.02.2021 18:52, Fabian Keil пишет: >> Artem Kuchin <artem@artem.ru> wrote on 2021-02-12: >> >>> 12.02.2021 18:06, Karl Denninger пишет: >>>> Blocking the read forces you to get the good copy off backup media and >>>> thus prevents that from happening. >>>> >>> I know what ZFS does and i damaged the same file in the same place on >>> purpose. Question is: how to read what's left of it. Just for kicks, i >>> don't have a backup, and i need to read what's left. It could be 1GB >>> file with only one byte damaged and it is of crazy importance to me. >>> So, >>> how to bypass all the checks and make it read the file no matter what? >> The patch from this PR adds a sysctl that allows to send corrupted data: >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221909 >> >> Using the added sysctl you can send and receive the dataset and then >> read the corrupted file from the received dataset. Note that ZFS >> replaces >> corrupted blocks completely with the 0x'zfs badd bloc' pattern instead >> of returning the corrupted data as is, thus increasing the amount of >> corruption in case of simple bit flips to whole blocks. >> >> Fabian > > Arghh. That's not what i want. This is strange. In case of stupid old > FS like FAT or even newer UFS i can dig into damaged file and collect > as much data as possible, while newer ZFS does not provide tools to > dig into data. That's was always my concern about ZFS. If something > bad goes with FAT/NTFS and even UFS - there are tons of tools which > can dissect the file system into bits so i can get as much as possible > of what's left. In case of ZFS there are no tools that i know and even > ZFS itself does not allow to get what left of normal data. > > This is frustrating. why..why.. You created a synthetic situation that in the real world almost-never exists (ONE byte modified in all copies in the same allocation block but all other data in that block is intact and recoverable.) In almost-all actual cases of "bit rot" it's exactly that; random and by statistics extraordinarily unlikely to hit all copies at once in the same allocation block. Therefore, ZFS can and does fix it; UFS or FAT silently returns the corrupted data, propagates it, and eventually screws you down the road. The nearly-every-case situation in the real world where a disk goes physically bad (I've had this happen *dozens* of times over my IT career) results in the drive being unable to return the block at all; you don't get all but the bad byte back, you get nothing for that block and any attempt to "touch" it results in either a hard error coming back with no data in the buffer or (if not a TLER device) a wildly-extended timeout before an I/O error is returned with, again, no usable data in the buffer. On "old" winchester-style spinning media and even floppy drives this resulted in an entire physical sector (usually 512 bytes) being irretrievably lost. In the case of a "modern" zoned or advanced-format hard drive or an SSD the amount of data impacted and unreadable is typically much larger than one sector; for an SDD it is frequently *at least* a 4k block (which can and frequently does span multiple files!) and for many instances of rotating rust it can be an entire *track* if the servo data is where the fault lies which can be a *huge* amount of data. The patch gives you all but one allocation block of data from ZFS, with that one block effectively zeroed. This is no worse than the usual actual (not your synthesized test) impact of such a failure in a the real world with other filesystems in virtually every instance where it happens "in the wild." In short there are very, very few actual "in the wild" failures where one byte is damaged and the rest surrounding that one byte is intact and retrievable. In most cases where an actual failure occurs the unreadable data constitutes *at least* a physical sector. -- Karl Denninger karl@denninger.net <mailto:karl@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ [-- Attachment #2 --] 0 *H 010 `He 0 *H 00 H^Ōc!5 H0 *H 010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA0 170817164217Z 270815164217Z0{10 UUS10UFlorida10U Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA0"0 *H 0 h-5B>[;olӴ0~͎O9}9Ye*$g!ukvʶLzN`jL>MD'7U 45CB+kY`bd~b*c3Ny-78ju]9HeuέsӬDؽmgwER?&UURj'}9nWD i`XcbGz \gG=u%\Oi13ߝ4 K44pYQr]Ie/r0+eEޝݖ0C15Mݚ@JSZ(zȏ NTa(25DD5.l<g[[ZarQQ%Buȴ~~`IohRbʳڟu2MS8EdFUClCMaѳ !}ș+2k/bųE,n当ꖛ\(8WV8 d]b yXw ܊:I39 00U]^§Q\ӎ0U#0T039N0b010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA @Ui0U0 0U0 *H :P U!>vJnio-#ן]WyujǑR̀Q nƇ!GѦFg\yLxgw=OPycehf[}ܷ['4ڝ\[p 6\o.B&JF"ZC{;*o*mcCcLY߾` t*S!(`]DHP5A~/NPp6=mhk밣'doA$86hm5ӚS@jެEgl )0JG`%k35PaC?σ ׳HEt}!P㏏%*BxbQwaKG$6h¦Mve;[o-Iی& I,Tcߎ#t wPA@l0P+KXBպT zGv;NcI3&JĬUPNa?/%W6G۟N000 k#Xd\=0 *H 0{10 UUS10UFlorida10U Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA0 170817212120Z 220816212120Z0W10 UUS10UFlorida10U Cuda Systems LLC10Ukarl@denninger.net0"0 *H 0 T[I-ΆϏ dn;Å@שy.us~_ZG%<MYd\gvfnsa1'6Egyjs"C [{~_K Pn+<*pv#Q+H/7[-vqDV^U>f%GX)H.|l`M(Cr>е͇6#odc"YljҦln8@5SA0&ۖ"OGj?UDWZ5 dDB7k-)9Izs-JAv J6L$Ն1SmY.Lqw*SH;EF'DĦH]MOgQQ|Mٙג2Z9y@y]}6ٽeY9Y2xˆ$T=eCǺǵbn֛{j|@LLt1[Dk5:$= ` M 00<+00.0,+0 http://ocsp.cudasystems.net:88880 U0 0 `HB0U0U%0++03 `HB &$OpenSSL Generated Client Certificate0U%՞V=;bzQ0U#0]^§Q\ӎϡ010 UUS10UFlorida10U Niceville10U Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA H^Ōc!5 H0U0karl@denninger.net0 *H ۠A0-j%--$%g2#ޡ1^>{K+uGEv1ş7Af&b&O;.;A5*U)ND2bF|\=]<sˋL!wrw٧>YMÄ3\mWR hSv!_zvl? 3_ xU%\^#O*Gk̍YI_&Fꊛ@&1n } ͬ:{hTP3B.;bU8:Z=^Gw8!k-@xE@i,+'Iᐚ:fhztX7/(hY` O.1}a`%RW^akǂpCAufgDix UTЩ/7}%=jnVZvcF<M= 2^GKH5魉 _O4ެByʈySkw=5@h.0z> W1000{10 UUS10UFlorida10U Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA k#Xd\=0 `He E0 *H 1 *H 0 *H 1 210212163717Z0O *H 1B@4>):Da "NDd00 E߉rg #ܥ&:Ͻ,0l *H 1_0]0 `He*0 `He0 *H 0*H 0 *H @0+0 *H (0 +7100{10 UUS10UFlorida10U Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA k#Xd\=0*H 10{10 UUS10UFlorida10U Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA k#Xd\=0 *H p<' xS)Lc2MY(PB3wڏ١q,VXxy Beo\ZiSyC*x5k?F.3iϟ=:OBh*G/eBD0SXxy5Ah4u6c2ZkEi="$Կ.`CI+0'$)r`t;QTxeIu% {ߙ E뉨GGӆ0LkDZ (3o5C}2v3;ADJEN>ŧ;jy@GI}d<g1>>⥗Ao܊å2m,$!U<pxLN(HW?Oda38ne=MH$Y5݄1|ӄ(5NK^$|$K-U,Z]-C KR:*71$O2home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2f82f113-9ca1-99a9-a433-89e3ae5edcbe>
