Date: Tue, 7 May 2019 08:46:26 -0500 From: Karl Denninger <karl@denninger.net> To: freebsd-stable@freebsd.org Subject: Re: ZFS... Message-ID: <a82bfabe-a8c3-fd9a-55ec-52530d4eafff@denninger.net> In-Reply-To: <A535026E-F9F6-4BBA-8287-87EFD02CF207@sorbs.net> References: <30506b3d-64fb-b327-94ae-d9da522f3a48@sorbs.net> <CAOtMX2gf3AZr1-QOX_6yYQoqE-H%2B8MjOWc=eK1tcwt5M3dCzdw@mail.gmail.com> <56833732-2945-4BD3-95A6-7AF55AB87674@sorbs.net> <3d0f6436-f3d7-6fee-ed81-a24d44223f2f@netfence.it> <17B373DA-4AFC-4D25-B776-0D0DED98B320@sorbs.net> <70fac2fe3f23f85dd442d93ffea368e1@ultra-secure.de> <70C87D93-D1F9-458E-9723-19F9777E6F12@sorbs.net> <CAGMYy3tYqvrKgk2c==WTwrH03uTN1xQifPRNxXccMsRE1spaRA@mail.gmail.com> <5ED8BADE-7B2C-4B73-93BC-70739911C5E3@sorbs.net> <d0118f7e-7cfc-8bf1-308c-823bce088039@denninger.net> <2e4941bf-999a-7f16-f4fe-1a520f2187c0@sorbs.net> <20190430102024.E84286@mulder.mintsol.com> <41FA461B-40AE-4D34-B280-214B5C5868B5@punkt.de> <20190506080804.Y87441@mulder.mintsol.com> <08E46EBF-154F-4670-B411-482DCE6F395D@sorbs.net> <33D7EFC4-5C15-4FE0-970B-E6034EF80BEF@gromit.dlib.vt.edu> <A535026E-F9F6-4BBA-8287-87EFD02CF207@sorbs.net>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format. --------------ms010102010001050009060703 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 5/7/2019 00:02, Michelle Sullivan wrote: > The problem I see with that statement is that the zfs dev mailing lists= constantly and consistently following the line of, the data is always ri= ght there is no need for a =E2=80=9Cfsck=E2=80=9D (which I actually get) = but it=E2=80=99s used to shut down every thread... the irony is I=E2=80=99= m now installing windows 7 and SP1 on a usb stick (well it=E2=80=99s actu= ally installed, but sp1 isn=E2=80=99t finished yet) so I can install a zf= s data recovery tool which reports to be able to =E2=80=9Cwalk the data=E2= =80=9D to retrieve all the files... the irony eh... install windows7 on = a usb stick to recover a FreeBSD installed zfs filesystem... will let yo= u know if the tool works, but as it was recommended by a dev I=E2=80=99m = hopeful... have another array (with zfs I might add) loaded and ready to = go... if the data recovery is successful I=E2=80=99ll blow away the origi= nal machine and work out what OS and drive setup will be safe for the dat= a in the future. I might even put FreeBSD and zfs back on it, but if I d= o it won=E2=80=99t be in the current Zraid2 config. Meh. Hardware failure is, well, hardware failure.=C2=A0 Yes, power-related failures are hardware failures. Never mind the potential for /software /failures.=C2=A0 Bugs are, well, bugs.=C2=A0 And they're a real thing.=C2=A0 Never had the shortcomings of= UFS bite you on an "unexpected" power loss?=C2=A0 Well, I have.=C2=A0 Is ZFS absol= utely safe against any such event?=C2=A0 No, but it's safe*r*. I've yet to have ZFS lose an entire pool due to something bad happening, but the same basic risk (entire filesystem being gone) has occurred more than once in my IT career with other filesystems -- including UFS, lowly MSDOS and NTFS, never mind their predecessors all the way back to floppy disks and the first 5Mb Winchesters.=C2=A0 I learned a long time ago that two is one and one is none when it comes to data, and WHEN two becomes one you SWEAT, because that second failure CAN happen at the worst possible time. As for RaidZ2 .vs. mirrored it's not as simple as you might think.=C2=A0 Mirrored vdevs can only lose one member per mirror set, unless you use three-member mirrors.=C2=A0 That sounds insane but actually it isn't in certain circumstances, such as very-read-heavy and high-performance-read environments. The short answer is that a 2-way mirrored set is materially faster on reads but has no acceleration on writes, and can lose one member per mirror.=C2=A0 If the SECOND one fails before you can resilver, and that resilver takes quite a long while if the disks are large, you're dead.=C2= =A0 However, if you do six drives as a 2x3 way mirror (that is, 3 vdevs each of a 2-way mirror) you now have three parallel data paths going at once and potentially six for reads -- and performance is MUCH better.=C2=A0 A 3-way mirror can lose two members (and could be organized as 3x2) but obviously requires lots of drive slots, 3x as much *power* per gigabyte stored (and you pay for power twice; once to buy it and again to get the heat out of the room where the machine is.) Raidz2 can also lose 2 drives without being dead.=C2=A0 However, it doesn= 't get any of the read performance improvement *and* takes a write performance penalty; Z2 has more write penalty than Z1 since it has to compute and write two parity entries instead of one, although in theory at least it can parallel those parity writes -- albeit at the cost of drive bandwidth congestion (e.g. interfering with other accesses to the same disk at the same time.)=C2=A0 In short RaidZx performs about as "wel= l" as the *slowest* disk in the set.=C2=A0 So why use it (particularly Z2) a= t all?=C2=A0 Because for "N" drives you get the protection of a 3-way mirro= r and *much* more storage.=C2=A0 A six-member RaidZ2 setup returns ~4Tb of usable space, where with a 2-way mirror it returns 3Tb and a 3-way mirror (which provides the same protection against drive failure as Z2) you have only *half* the storage.=C2=A0 IMHO ordinary Raidz isn't worth t= he trade-offs, but Z2 frequently is. In addition more spindles means more failures, all other things being equal, so if you need "X" TB of storage and organize it as 3-way mirrors you now have twice as many physical spindles which means on average you'll take twice as many faults.=C2=A0 If performance is more important = then the choice is obvious.=C2=A0 If density is more important (that is, a lot= or even most of the data is rarely accessed at all) then the choice is fairly simple too.=C2=A0 In many workloads you have some of both, and thu= s the correct choice is a hybrid arrangement; that's what I do here, because I have a lot of data that is rarely-to-never accessed and read-only but also have some data that is frequently accessed and frequently written.=C2=A0 One size does not fit all in such a workload. MOST systems, by the way, have this sort of paradigm (a huge percentage of the data is rarely read and never written) but it doesn't become economic or sane to try to separate them until you get well into the terabytes of storage range and a half-dozen or so physical volumes.=C2=A0= There's a=C2=A0 very clean argument that prior to that point but with gre= ater than one drive mirrored is always the better choice. Note that if you have an *adapter* go insane (and as I've noted here I've had it happen TWICE in my IT career!) then *all* of the data on the disks served by that adapter is screwed. It doesn't make a bit of difference what filesystem you're using in that scenario and thus you had better have a backup scheme and make sure it works as well, never mind software bugs or administrator stupidity ("dd" as root to the wrong target, for example, will reliably screw you every single time!) For a single-disk machine ZFS is no *less* safe than UFS and provides a number of advantages, with arguably the most-important being easily-used snapshots.=C2=A0 Not only does this simplify backups since coherency duri= ng the backup is never at issue and incremental backups become fast and easily-done in addition boot environments make roll-forward and even *roll-back* reasonable to implement for software updates -- a critical capability if you ever run an OS version update and something goes seriously wrong with it.=C2=A0 If you've never had that happen then consi= der yourself blessed; it's NOT fun to manage in a UFS environment and often winds up leading to a "restore from backup" scenario.=C2=A0 (To be fair i= t can be with ZFS too if you're foolish enough to upgrade the pool before being sure you're happy with the new OS rev.) --=20 Karl Denninger karl@denninger.net <mailto:karl@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms010102010001050009060703 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC DdgwggagMIIEiKADAgECAhMA5EiKghDOXrvfxYxjITXYDdhIMA0GCSqGSIb3DQEBCwUAMIGL MQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UEBwwJTmljZXZpbGxlMRkw FwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5c3RlbXMgQ0ExITAf BgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQTAeFw0xNzA4MTcxNjQyMTdaFw0yNzA4 MTUxNjQyMTdaMHsxCzAJBgNVBAYTAlVTMRAwDgYDVQQIDAdGbG9yaWRhMRkwFwYDVQQKDBBD dWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5c3RlbXMgQ0ExJTAjBgNVBAMMHEN1 ZGEgU3lzdGVtcyBMTEMgMjAxNyBJbnQgQ0EwggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIK AoICAQC1aJotNUI+W4jP7xQDO8L/b4XiF4Rss9O0B+3vMH7Njk85fZ052QhZpMVlpaaO+sCI KqG3oNEbuOHzJB/NDJFnqh7ijBwhdWutdsq23Ux6TvxgakyMPpT6TRNEJzcBVQA0kpby1DVD 0EKSK/FrWWBiFmSxg7qUfmIq/mMzgE6epHktyRM3OGq3dbRdOUgfumWrqHXOrdJz06xE9NzY vc9toqZnd79FUtE/nSZVm1VS3Grq7RKV65onvX3QOW4W1ldEHwggaZxgWGNiR/D4eosAGFxn uYeWlKEC70c99Mp1giWux+7ur6hc2E+AaTGh+fGeijO5q40OGd+dNMgK8Es0nDRw81lRcl24 SWUEky9y8DArgIFlRd6d3ZYwgc1DMTWkTavx3ZpASp5TWih6yI8ACwboTvlUYeooMsPtNa9E 6UQ1nt7VEi5syjxnDltbEFoLYcXBcqhRhFETJe9CdenItAHAtOya3w5+fmC2j/xJz29og1KH YqWHlo3Kswi9G77an+zh6nWkMuHs+03DU8DaOEWzZEav3lVD4u76bKRDTbhh0bMAk4eXriGL h4MUoX3Imfcr6JoyheVrAdHDL/BixbMH1UUspeRuqQMQ5b2T6pabXP0oOB4FqldWiDgJBGRd zWLgCYG8wPGJGYgHibl5rFiI5Ix3FQncipc6SdUzOQIDAQABo4IBCjCCAQYwHQYDVR0OBBYE FF3AXsKnjdPND5+bxVECGKtc047PMIHABgNVHSMEgbgwgbWAFBu1oRhUMNEzjODolDka5k4Q EDBioYGRpIGOMIGLMQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UEBwwJ TmljZXZpbGxlMRkwFwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRhIFN5 c3RlbXMgQ0ExITAfBgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQYIJAKxAy1WBo2kY MBIGA1UdEwEB/wQIMAYBAf8CAQAwDgYDVR0PAQH/BAQDAgGGMA0GCSqGSIb3DQEBCwUAA4IC AQCB5686UCBVIT52jO3sz9pKuhxuC2npi8ZvoBwt/IH9piPA15/CGF1XeXUdu2qmhOjHkVLN gO7XB1G8CuluxofOIUce0aZGyB+vZ1ylHXlMeB0R82f5dz3/T7RQso55Y2Vog2Zb7PYTC5B9 oNy3ylsnNLzanYlcW3AAfzZcbxYuAdnuq0Im3EpGm8DoItUcf1pDezugKm/yKtNtY6sDyENj tExZ377cYA3IdIwqn1Mh4OAT/Rmh8au2rZAo0+bMYBy9C11Ex0hQ8zWcvPZBDn4v4RtO8g+K uQZQcJnO09LJNtw94W3d2mj4a7XrsKMnZKvm6W9BJIQ4Nmht4wXAtPQ1xA+QpxPTmsGAU0Cv HmqVC7XC3qxFhaOrD2dsvOAK6Sn3MEpH/YrfYCX7a7cz5zW3DsJQ6o3pYfnnQz+hnwLlz4MK 17NIA0WOdAF9IbtQqarf44+PEyUbKtz1r0KGeGLs+VGdd2FLA0e7yuzxJDYcaBTVwqaHhU2/ Fna/jGU7BhrKHtJbb/XlLeFJ24yvuiYKpYWQSSyZu1R/gvZjHeGb344jGBsZdCDrdxtQQcVA 6OxsMAPSUPMrlg9LWELEEYnVulQJerWxpUecGH92O06wwmPgykkz//UmmgjVSh7ErNvL0lUY UMfunYVO/O5hwhW+P4gviCXzBFeTtDZH259O7TCCBzAwggUYoAMCAQICEwCg0WvVwekjGFiO 62SckFwepz0wDQYJKoZIhvcNAQELBQAwezELMAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3Jp ZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBMTEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBD QTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExMQyAyMDE3IEludCBDQTAeFw0xNzA4MTcyMTIx MjBaFw0yMjA4MTYyMTIxMjBaMFcxCzAJBgNVBAYTAlVTMRAwDgYDVQQIDAdGbG9yaWRhMRkw FwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRswGQYDVQQDDBJrYXJsQGRlbm5pbmdlci5uZXQw ggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQC+HVSyxVtJhy3Ohs+PAGRuO//Dha9A 16l5FPATr6wude9zjX5f2lrkRyU8vhCXTZW7WbvWZKpcZ8r0dtZmiK9uF58Ec6hhvfkxJzbg 96WHBw5Fumd5ahZzuCJDtCAWW8R7/KN+zwzQf1+B3MVLmbaXAFBuKzySKhKMcHbK3/wjUYTg y+3UK6v2SBrowvkUBC+jxNg3Wy12GsTXcUS/8FYIXgVVPgfZZrbJJb5HWOQpvvhILpPCD3xs YJFNKEPltXKWHT7Qtc2HNqikgNwj8oqOb+PeZGMiWapsatKm8mxuOOGOEBhAoTVTwUHlMNTg 6QUCJtuWFCK38qOCyk9Haj+86lUU8RG6FkRXWgMbNQm1mWREQhw3axgGLSntjjnznJr5vsvX SYR6c+XKLd5KQZcS6LL8FHYNjqVKHBYM+hDnrTZMqa20JLAF1YagutDiMRURU23iWS7bA9tM cXcqkclTSDtFtxahRifXRI7Epq2GSKuEXe/1Tfb5CE8QsbCpGsfSwv2tZ/SpqVG08MdRiXxN 5tmZiQWo15IyWoeKOXl/hKxA9KPuDHngXX022b1ly+5ZOZbxBAZZMod4y4b4FiRUhRI97r9l CxsP/EPHuuTIZ82BYhrhbtab8HuRo2ofne2TfAWY2BlA7ExM8XShMd9bRPZrNTokPQPUCWCg CdIATQIDAQABo4IBzzCCAcswPAYIKwYBBQUHAQEEMDAuMCwGCCsGAQUFBzABhiBodHRwOi8v b2NzcC5jdWRhc3lzdGVtcy5uZXQ6ODg4ODAJBgNVHRMEAjAAMBEGCWCGSAGG+EIBAQQEAwIF oDAOBgNVHQ8BAf8EBAMCBeAwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMEMDMGCWCG SAGG+EIBDQQmFiRPcGVuU1NMIEdlbmVyYXRlZCBDbGllbnQgQ2VydGlmaWNhdGUwHQYDVR0O BBYEFLElmNWeVgsBPe7O8NiBzjvjYnpRMIHKBgNVHSMEgcIwgb+AFF3AXsKnjdPND5+bxVEC GKtc047PoYGRpIGOMIGLMQswCQYDVQQGEwJVUzEQMA4GA1UECAwHRmxvcmlkYTESMBAGA1UE BwwJTmljZXZpbGxlMRkwFwYDVQQKDBBDdWRhIFN5c3RlbXMgTExDMRgwFgYDVQQLDA9DdWRh IFN5c3RlbXMgQ0ExITAfBgNVBAMMGEN1ZGEgU3lzdGVtcyBMTEMgMjAxNyBDQYITAORIioIQ zl6738WMYyE12A3YSDAdBgNVHREEFjAUgRJrYXJsQGRlbm5pbmdlci5uZXQwDQYJKoZIhvcN AQELBQADggIBAJXboPFBMLMtaiUt4KEtJCXlHO/3ZzIUIw/eobWFMdhe7M4+0u3te0sr77QR dcPKR0UeHffvpth2Mb3h28WfN0FmJmLwJk+pOx4u6uO3O0E1jNXoKh8fVcL4KU79oEQyYkbu 2HwbXBU9HbldPOOZDnPLi0whi/sbFHdyd4/w/NmnPgzAsQNZ2BYT9uBNr+jZw4SsluQzXG1X lFL/qCBoi1N2mqKPIepfGYF6drbr1RnXEJJsuD+NILLooTNf7PMgHPZ4VSWQXLNeFfygoOOK FiO0qfxPKpDMA+FHa8yNjAJZAgdJX5Mm1kbqipvb+r/H1UAmrzGMbhmf1gConsT5f8KU4n3Q IM2sOpTQe7BoVKlQM/fpQi6aBzu67M1iF1WtODpa5QUPvj1etaK+R3eYBzi4DIbCIWst8MdA 1+fEeKJFvMEZQONpkCwrJ+tJEuGQmjoQZgK1HeloepF0WDcviiho5FlgtAij+iBPtwMuuLiL shAXA5afMX1hYM4l11JXntle12EQFP1r6wOUkpOdxceCcMVDEJBBCHW2ZmdEaXgAm1VU+fnQ qS/wNw/S0X3RJT1qjr5uVlp2Y0auG/eG0jy6TT0KzTJeR9tLSDXprYkN2l/Qf7/nT6Q03qyE QnnKiBXWAZXveafyU/zYa7t3PTWFQGgWoC4w6XqgPo4KV44OMYIFBzCCBQMCAQEwgZIwezEL MAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBM TEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExM QyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTANBglghkgBZQMEAgMFAKCCAkUw GAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMTkwNTA3MTM0NjI2 WjBPBgkqhkiG9w0BCQQxQgRAOYQkBDX0w1SyuZowvLyKERKTu0KfFWvNwvQ+h7FPK7xM4F3D MftnjzqjVhuDuhA4Qntbz4XEDZXJkKUo+kFQIzBsBgkqhkiG9w0BCQ8xXzBdMAsGCWCGSAFl AwQBKjALBglghkgBZQMEAQIwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3 DQMCAgFAMAcGBSsOAwIHMA0GCCqGSIb3DQMCAgEoMIGjBgkrBgEEAYI3EAQxgZUwgZIwezEL MAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lzdGVtcyBM TEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0ZW1zIExM QyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTCBpQYLKoZIhvcNAQkQAgsxgZWg gZIwezELMAkGA1UEBhMCVVMxEDAOBgNVBAgMB0Zsb3JpZGExGTAXBgNVBAoMEEN1ZGEgU3lz dGVtcyBMTEMxGDAWBgNVBAsMD0N1ZGEgU3lzdGVtcyBDQTElMCMGA1UEAwwcQ3VkYSBTeXN0 ZW1zIExMQyAyMDE3IEludCBDQQITAKDRa9XB6SMYWI7rZJyQXB6nPTANBgkqhkiG9w0BAQEF AASCAgCKguBHkVXQdYg1mTgrRWpiQ3hv0b1FJQV9IOkIq9jYlahMylctdekrZjWiLHqX8+HW s7nzHZahyYiA1ke6JVxloYrm2LLL9Sj0Jo9CEhwyky8aAYY4JcqNJ3ehN5+wHyIEaiHteB88 hXXobjpQA9aDRSXozH3njZ7zdRxaYCWg/FkMLILGbknoLM4uhn6ToCnSLKJD1FVXTBoFoc+b uHbbo3Ueo8/vNZIXNWR7k85yZXHhEDE7OPhnwGH0aoH8/70KKqsZtu9xEnlTvGKlAGBpo5sH 601rvszw/22GKOfKv8zAIb0C4K8p3IPHLJSu8zuEfnSr9LmY2Iq78rk4NXa5HVm8HtJEbqkn pRbEKSOhJRsijKwPD7XtrKtw5BsiddtfHKxN6kAgAsEKLY0Ft/7m/F06Zkfdn1FmrhBSkNtU WXIoB6xzgdKHQCK/qbQQXWyMqcyODORsnkz+LgRB96JgZ10vp338XTiwAjzZ2CYp2dnI5QM7 bQlbTME7IvxVeHXvZpZ2XIGuDe9kZxivcah2DyZki3YaIw/o6prGwnPYE3zHvpff9h9HlwnM TXN2ELofe/G2Dobbc/+WEX8qH4822PSEyHOhAr+AIKAJg5R98quORqsL65Jzp2omys1Y1s7p aeT/I/Nu2X3i56iSQ5hnjXBayRT+2pX1ty/vTHHFuQAAAAAAAA== --------------ms010102010001050009060703--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a82bfabe-a8c3-fd9a-55ec-52530d4eafff>