Date: Tue, 30 Apr 2019 23:38:34 +1000 From: Michelle Sullivan <michelle@sorbs.net> To: Karl Denninger <karl@denninger.net>, freebsd-stable@freebsd.org Subject: Re: ZFS... Message-ID: <aab20556-07a4-bc58-d5e8-d2f0366eb77e@sorbs.net> In-Reply-To: <f868b452-40e9-f2c8-cdee-dde5e53a214c@denninger.net> References: <30506b3d-64fb-b327-94ae-d9da522f3a48@sorbs.net> <CAOtMX2gf3AZr1-QOX_6yYQoqE-H%2B8MjOWc=eK1tcwt5M3dCzdw@mail.gmail.com> <56833732-2945-4BD3-95A6-7AF55AB87674@sorbs.net> <3d0f6436-f3d7-6fee-ed81-a24d44223f2f@netfence.it> <17B373DA-4AFC-4D25-B776-0D0DED98B320@sorbs.net> <f868b452-40e9-f2c8-cdee-dde5e53a214c@denninger.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Karl Denninger wrote: > On 4/30/2019 03:09, Michelle Sullivan wrote: >> Consider.. >> >> If one triggers such a fault on a production server, how can one justify transferring from backup multiple terabytes (or even petabytes now) of data to repair an unmountable/faulted array.... because all backup solutions I know currently would take days if not weeks to restore the sort of store ZFS is touted with supporting. > Had it happen on a production server a few years back with ZFS. The > *hardware* went insane (disk adapter) and scribbled on *all* of the vdevs. > > The machine crashed and would not come back up -- at all. I insist on > (and had) emergency boot media physically in the box (a USB key) in any > production machine and it was quite-quickly obvious that all of the > vdevs were corrupted beyond repair. There was no rational option other > than to restore. > > It was definitely not a pleasant experience, but this is why when you > get into systems and data store sizes where it's a five-alarm pain in > the neck you must figure out some sort of strategy that covers you 99% > of the time without a large amount of downtime involved, and in the 1% > case accept said downtime. In this particular circumstance the customer > didn't want to spend on a doubled-and-transaction-level protected > on-site (in the same DC) redundancy setup originally so restore, as > opposed to fail-over/promote and then restore and build a new > "redundant" box where the old "primary" resided was the most-viable > option. Time to recover essential functions was ~8 hours (and over 24 > hours for everything to be restored.) > How big was the storage area? -- Michelle Sullivan http://www.mhix.org/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?aab20556-07a4-bc58-d5e8-d2f0366eb77e>