Date: Tue, 7 May 2019 21:12:50 -0400 From: Joe Maloney <jmaloney@ixsystems.com> To: Michelle Sullivan <michelle@sorbs.net> Cc: Karl Denninger <karl@denninger.net>, freebsd-stable@freebsd.org Subject: Re: ZFS... Message-ID: <BA5AC6FC-246A-47DC-B4D9-16106B2C5FB7@ixsystems.com> In-Reply-To: <a1b78a63-0ef1-af51-4e33-a9a97a257c8b@sorbs.net> References: <30506b3d-64fb-b327-94ae-d9da522f3a48@sorbs.net> <CAOtMX2gf3AZr1-QOX_6yYQoqE-H%2B8MjOWc=eK1tcwt5M3dCzdw@mail.gmail.com> <56833732-2945-4BD3-95A6-7AF55AB87674@sorbs.net> <3d0f6436-f3d7-6fee-ed81-a24d44223f2f@netfence.it> <17B373DA-4AFC-4D25-B776-0D0DED98B320@sorbs.net> <70fac2fe3f23f85dd442d93ffea368e1@ultra-secure.de> <70C87D93-D1F9-458E-9723-19F9777E6F12@sorbs.net> <CAGMYy3tYqvrKgk2c==WTwrH03uTN1xQifPRNxXccMsRE1spaRA@mail.gmail.com> <5ED8BADE-7B2C-4B73-93BC-70739911C5E3@sorbs.net> <d0118f7e-7cfc-8bf1-308c-823bce088039@denninger.net> <2e4941bf-999a-7f16-f4fe-1a520f2187c0@sorbs.net> <20190430102024.E84286@mulder.mintsol.com> <41FA461B-40AE-4D34-B280-214B5C5868B5@punkt.de> <20190506080804.Y87441@mulder.mintsol.com> <08E46EBF-154F-4670-B411-482DCE6F395D@sorbs.net> <33D7EFC4-5C15-4FE0-970B-E6034EF80BEF@gromit.dlib.vt.edu> <A535026E-F9F6-4BBA-8287-87EFD02CF207@sorbs.net> <a82bfabe-a8c3-fd9a-55ec-52530d4eafff@denninger.net> <a1b78a63-0ef1-af51-4e33-a9a97a257c8b@sorbs.net>
next in thread | previous in thread | raw e-mail | index | archive | help
You might look at UFS Explorer. It claims to have ZFS support now. It cost= s money for a license and I think required windows last I used it. I can at= test that a previous version allowed me to recover all the data I needed fro= m a lost UFS mirror almost a decade ago. Sent from my iPhone > On May 7, 2019, at 9:01 PM, Michelle Sullivan <michelle@sorbs.net> wrote: >=20 > Karl Denninger wrote: >>> On 5/7/2019 00:02, Michelle Sullivan wrote: >>> The problem I see with that statement is that the zfs dev mailing lists c= onstantly and consistently following the line of, the data is always right t= here is no need for a =E2=80=9Cfsck=E2=80=9D (which I actually get) but it=E2= =80=99s used to shut down every thread... the irony is I=E2=80=99m now insta= lling windows 7 and SP1 on a usb stick (well it=E2=80=99s actually installed= , but sp1 isn=E2=80=99t finished yet) so I can install a zfs data recovery t= ool which reports to be able to =E2=80=9Cwalk the data=E2=80=9D to retrieve a= ll the files... the irony eh... install windows7 on a usb stick to recover a= FreeBSD installed zfs filesystem... will let you know if the tool works, b= ut as it was recommended by a dev I=E2=80=99m hopeful... have another array (= with zfs I might add) loaded and ready to go... if the data recovery is succ= essful I=E2=80=99ll blow away the original machine and work out what OS and d= rive setup will be safe for the data in the future. I might even put FreeBS= D and zfs back on it, but if I do it won=E2=80=99t be in the current Zraid2 c= onfig. >> Meh. >>=20 >> Hardware failure is, well, hardware failure. Yes, power-related >> failures are hardware failures. >>=20 >> Never mind the potential for /software /failures. Bugs are, well, >> bugs. And they're a real thing. Never had the shortcomings of UFS bite >> you on an "unexpected" power loss? Well, I have. Is ZFS absolutely >> safe against any such event? No, but it's safe*r*. >=20 > Yes and no ... I'll explain... >=20 >>=20 >> I've yet to have ZFS lose an entire pool due to something bad happening, >> but the same basic risk (entire filesystem being gone) >=20 > Everytime I have seen this issue (and it's been more than once - though un= til now recoverable - even if extremely painful) - its always been during a r= esilver of a failed drive and something happening... panic, another drive fa= ilure, power etc.. any other time its rock solid... which is the yes and no.= .. under normal circumstances zfs is very very good and seems as safe as or s= afer than UFS... but my experience is ZFS has one really bad flaw.. if there= is a corruption in the metadata - even if the stored data is 100% correct -= it will fault the pool and thats it it's gone barring some luck and painful= recovery (backups aside) ... this other file systems also suffer but there a= re tools that *majority of the time* will get you out of the s**t with littl= e pain. Barring this windows based tool I haven't been able to run yet, zfs= appears to have nothing. >=20 >> has occurred more >> than once in my IT career with other filesystems -- including UFS, lowly >> MSDOS and NTFS, never mind their predecessors all the way back to floppy >> disks and the first 5Mb Winchesters. >=20 > Absolutely, been there done that.. and btrfs...*ouch* still as bad.. howev= er with the only one btrfs install I had (I didn't knopw it was btrfs undern= eath, but netgear NAS...) I was still able to recover the data even though i= t had screwed the file system so bad I vowed never to consider or use it aga= in on anything ever... >=20 >>=20 >> I learned a long time ago that two is one and one is none when it comes >> to data, and WHEN two becomes one you SWEAT, because that second failure >> CAN happen at the worst possible time. >=20 > and does.. >=20 >>=20 >> As for RaidZ2 .vs. mirrored it's not as simple as you might think. >> Mirrored vdevs can only lose one member per mirror set, unless you use >> three-member mirrors. That sounds insane but actually it isn't in >> certain circumstances, such as very-read-heavy and high-performance-read >> environments. >=20 > I know - this is why I don't use mirrored - because wear patterns will ens= ure both sides of the mirror are closely matched. >=20 >>=20 >> The short answer is that a 2-way mirrored set is materially faster on >> reads but has no acceleration on writes, and can lose one member per >> mirror. If the SECOND one fails before you can resilver, and that >> resilver takes quite a long while if the disks are large, you're dead. >> However, if you do six drives as a 2x3 way mirror (that is, 3 vdevs each >> of a 2-way mirror) you now have three parallel data paths going at once >> and potentially six for reads -- and performance is MUCH better. A >> 3-way mirror can lose two members (and could be organized as 3x2) but >> obviously requires lots of drive slots, 3x as much *power* per gigabyte >> stored (and you pay for power twice; once to buy it and again to get the >> heat out of the room where the machine is.) >=20 > my problem (as always) is slots not so much the power. >=20 >>=20 >> Raidz2 can also lose 2 drives without being dead. However, it doesn't >> get any of the read performance improvement *and* takes a write >> performance penalty; Z2 has more write penalty than Z1 since it has to >> compute and write two parity entries instead of one, although in theory >> at least it can parallel those parity writes -- albeit at the cost of >> drive bandwidth congestion (e.g. interfering with other accesses to the >> same disk at the same time.) In short RaidZx performs about as "well" >> as the *slowest* disk in the set. > Which is why I built mine with identical drives (though different producti= on batches :) ) ... majority of the data in my storage array is write once (= or twice) read many. >=20 >> So why use it (particularly Z2) at >> all? Because for "N" drives you get the protection of a 3-way mirror >> and *much* more storage. A six-member RaidZ2 setup returns ~4Tb of >> usable space, where with a 2-way mirror it returns 3Tb and a 3-way >> mirror (which provides the same protection against drive failure as Z2) >> you have only *half* the storage. IMHO ordinary Raidz isn't worth the >> trade-offs, but Z2 frequently is. >>=20 >> In addition more spindles means more failures, all other things being >> equal, so if you need "X" TB of storage and organize it as 3-way mirrors >> you now have twice as many physical spindles which means on average >> you'll take twice as many faults. If performance is more important then >> the choice is obvious. If density is more important (that is, a lot or >> even most of the data is rarely accessed at all) then the choice is >> fairly simple too. In many workloads you have some of both, and thus >> the correct choice is a hybrid arrangement; that's what I do here, >> because I have a lot of data that is rarely-to-never accessed and >> read-only but also have some data that is frequently accessed and >> frequently written. One size does not fit all in such a workload. > This is where I came to 2 systems (with different data) .. one was for den= sity, the other performance. Storage vs working etc.. >=20 >> MOST systems, by the way, have this sort of paradigm (a huge percentage >> of the data is rarely read and never written) but it doesn't become >> economic or sane to try to separate them until you get well into the >> terabytes of storage range and a half-dozen or so physical volumes. >> There's a very clean argument that prior to that point but with greater >> than one drive mirrored is always the better choice. >>=20 >> Note that if you have an *adapter* go insane (and as I've noted here >> I've had it happen TWICE in my IT career!) then *all* of the data on the >> disks served by that adapter is screwed. >=20 > 100% with you - been there done that... and it doesn't matter what os or f= ilesystem, hardware failure where silent data corruption happens because of a= n adapter will always take you out (and zfs will not save you in many cases o= f that either.) >>=20 >> It doesn't make a bit of difference what filesystem you're using in that >> scenario and thus you had better have a backup scheme and make sure it >> works as well, never mind software bugs or administrator stupidity ("dd" >> as root to the wrong target, for example, will reliably screw you every >> single time!) >>=20 >> For a single-disk machine ZFS is no *less* safe than UFS and provides a >> number of advantages, with arguably the most-important being easily-used >> snapshots. >=20 > Depends in normal operating I agree... but when it comes to all or nothing= , that is a matter of perspective. Personally I prefer to have in place rec= overy options and/or multiple *possible* recovery options rather than ... "d= estroy the pool and recreate it from scratch, hope you have backups"... >=20 >> Not only does this simplify backups since coherency during >> the backup is never at issue and incremental backups become fast and >> easily-done in addition boot environments make roll-forward and even >> *roll-back* reasonable to implement for software updates -- a critical >> capability if you ever run an OS version update and something goes >> seriously wrong with it. If you've never had that happen then consider >> yourself blessed; >=20 > I have been there (especially in the early days (pre 0.83 kernel) versions= of Linux :) ) >=20 >> it's NOT fun to manage in a UFS environment and often >> winds up leading to a "restore from backup" scenario. (To be fair it >> can be with ZFS too if you're foolish enough to upgrade the pool before >> being sure you're happy with the new OS rev.) >>=20 > Actually I have a simple way with UFS (and ext2/3/4 etc) ... split the boo= t disk almost down the center.. create 3 partitions.. root, swap, altroot. r= oot and altroot are almost identical, one is always active, new OS goes on t= he other, switch to make the other active (primary) when you've tested... it= 's only gives one level of roll forward/roll back, but it works for me and h= as never failed (boot disk/OS wise) since I implemented it... but then I don= 't let anyone else in the company have root access so they cannot dd or "rm -= r . /" or "rm -r .*" (both of which are the only way I have done that before= - back in 1994 and never done it since - its something you learn or get out= of IT :P .. and for those who didn't get the latter it should have been 'rm= -r .??*' - and why are you on '-stable' ...? :P ) >=20 > Regards, >=20 > --=20 > Michelle Sullivan > http://www.mhix.org/ >=20 >=20 > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BA5AC6FC-246A-47DC-B4D9-16106B2C5FB7>