From owner-freebsd-stable@freebsd.org  Tue Apr 30 13:38:47 2019
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id CCE96159164A
 for <freebsd-stable@mailman.ysv.freebsd.org>;
 Tue, 30 Apr 2019 13:38:47 +0000 (UTC)
 (envelope-from michelle@sorbs.net)
Received: from hades.sorbs.net (hades.sorbs.net [72.12.213.40])
 by mx1.freebsd.org (Postfix) with ESMTP id 4173F714DB
 for <freebsd-stable@freebsd.org>; Tue, 30 Apr 2019 13:38:47 +0000 (UTC)
 (envelope-from michelle@sorbs.net)
MIME-version: 1.0
Content-transfer-encoding: 7BIT
Content-type: text/plain; CHARSET=US-ASCII; format=flowed
Received: from isux.com (gate.mhix.org [203.206.128.220])
 by hades.sorbs.net (Oracle Communications Messaging Server 7.0.5.29.0 64bit
 (built Jul  9 2013)) with ESMTPSA id <0PQS002VU17JY400@hades.sorbs.net> for
 freebsd-stable@freebsd.org; Tue, 30 Apr 2019 06:52:33 -0700 (PDT)
Subject: Re: ZFS...
To: Karl Denninger <karl@denninger.net>, freebsd-stable@freebsd.org
References: <30506b3d-64fb-b327-94ae-d9da522f3a48@sorbs.net>
 <CAOtMX2gf3AZr1-QOX_6yYQoqE-H+8MjOWc=eK1tcwt5M3dCzdw@mail.gmail.com>
 <56833732-2945-4BD3-95A6-7AF55AB87674@sorbs.net>
 <3d0f6436-f3d7-6fee-ed81-a24d44223f2f@netfence.it>
 <17B373DA-4AFC-4D25-B776-0D0DED98B320@sorbs.net>
 <f868b452-40e9-f2c8-cdee-dde5e53a214c@denninger.net>
From: Michelle Sullivan <michelle@sorbs.net>
Message-id: <aab20556-07a4-bc58-d5e8-d2f0366eb77e@sorbs.net>
Date: Tue, 30 Apr 2019 23:38:34 +1000
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:51.0)
 Gecko/20100101 Firefox/51.0 SeaMonkey/2.48
In-reply-to: <f868b452-40e9-f2c8-cdee-dde5e53a214c@denninger.net>
X-Rspamd-Queue-Id: 4173F714DB
X-Spamd-Bar: ---
Authentication-Results: mx1.freebsd.org;
 spf=pass (mx1.freebsd.org: domain of michelle@sorbs.net designates
 72.12.213.40 as permitted sender) smtp.mailfrom=michelle@sorbs.net
X-Spamd-Result: default: False [-3.06 / 15.00]; ARC_NA(0.00)[];
 RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0];
 FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[];
 R_SPF_ALLOW(-0.20)[+a:hades.sorbs.net];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain];
 DMARC_NA(0.00)[sorbs.net]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 MX_GOOD(-0.01)[cached: battlestar.sorbs.net];
 RCPT_COUNT_TWO(0.00)[2];
 RCVD_IN_DNSWL_NONE(0.00)[40.213.12.72.list.dnswl.org : 127.0.10.0];
 SUBJ_ALL_CAPS(0.45)[6];
 IP_SCORE(-0.82)[ip: (-2.18), ipnet: 72.12.192.0/19(-1.04), asn: 11114(-0.80),
 country: US(-0.06)]; NEURAL_HAM_SHORT(-0.99)[-0.987,0];
 RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:11114, ipnet:72.12.192.0/19, country:US];
 MID_RHS_MATCH_FROM(0.00)[]; CTE_CASE(0.50)[];
 RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Apr 2019 13:38:48 -0000

Karl Denninger wrote:
> On 4/30/2019 03:09, Michelle Sullivan wrote:
>> Consider..
>>
>> If one triggers such a fault on a production server, how can one justify transferring from backup multiple terabytes (or even petabytes now) of data to repair an unmountable/faulted array.... because all backup solutions I know currently would take days if not weeks to restore the sort of store ZFS is touted with supporting.
> Had it happen on a production server a few years back with ZFS.  The
> *hardware* went insane (disk adapter) and scribbled on *all* of the vdevs.
>
> The machine crashed and would not come back up -- at all.  I insist on
> (and had) emergency boot media physically in the box (a USB key) in any
> production machine and it was quite-quickly obvious that all of the
> vdevs were corrupted beyond repair.  There was no rational option other
> than to restore.
>
> It was definitely not a pleasant experience, but this is why when you
> get into systems and data store sizes where it's a five-alarm pain in
> the neck you must figure out some sort of strategy that covers you 99%
> of the time without a large amount of downtime involved, and in the 1%
> case accept said downtime.  In this particular circumstance the customer
> didn't want to spend on a doubled-and-transaction-level protected
> on-site (in the same DC) redundancy setup originally so restore, as
> opposed to fail-over/promote and then restore and build a new
> "redundant" box where the old "primary" resided was the most-viable
> option.  Time to recover essential functions was ~8 hours (and over 24
> hours for everything to be restored.)
>
How big was the storage area?

-- 
Michelle Sullivan
http://www.mhix.org/