From owner-freebsd-stable@freebsd.org  Wed May  8 01:12:54 2019
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2C5981599004
 for <freebsd-stable@mailman.ysv.freebsd.org>;
 Wed,  8 May 2019 01:12:54 +0000 (UTC)
 (envelope-from jmaloney@ixsystems.com)
Received: from mail-yw1-xc43.google.com (mail-yw1-xc43.google.com
 [IPv6:2607:f8b0:4864:20::c43])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 00F2784971
 for <freebsd-stable@freebsd.org>; Wed,  8 May 2019 01:12:53 +0000 (UTC)
 (envelope-from jmaloney@ixsystems.com)
Received: by mail-yw1-xc43.google.com with SMTP id q185so14875264ywe.3
 for <freebsd-stable@freebsd.org>; Tue, 07 May 2019 18:12:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=ixsystems-com.20150623.gappssmtp.com; s=20150623;
 h=mime-version:subject:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to;
 bh=r7S9TyG0/3sGFwSlfBsNpByx3fZ13UMebv2SX7KurEg=;
 b=at+zgSBkJy7lmselLOXiHrIEq6fAX3zG9uo8NBbe8Rn8ZATajnJemiN9uGjSv8TbWj
 DrrSsBuBVS8O+jl4PWPj4AkkNfsof5NPN5OfTMt0XzhGiCpw3sBfoPMbpg22m/cfporF
 k6a9JDHm7toWoy7hbRGQ95TdS8hYkugX1Ef3D1xWyaKrXcS9sogSwO8EdaQ93noKUr0a
 EA6y1KV+ZO9bDBWEUPjvPuBKlxWAsOFF8bTsdXpM56k1kdSQO71jIeVLKjagdzxkRGAN
 /cttkyT1mwTx/ZK0yQEN2x5O/0Rirwye1uww1YJLRG5NnIYFi4NxzI3bKY8Xc6CXeXFm
 oEpA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc
 :content-transfer-encoding:message-id:references:to;
 bh=r7S9TyG0/3sGFwSlfBsNpByx3fZ13UMebv2SX7KurEg=;
 b=GxrmmhTs43Yz82Q1iJ6KQgTNu3oln/dRMMUsUzVeiv5jI3v9t0y15aqRU4thpbhJ43
 zXNQjXjoC+abdqPVAC5VeJ0WRjJG8vIc6CuGOH9XdVOVTyW37VwzqzAU5a8e/LcOse7I
 Ll7Dxq5rZ+t4vpbvQs/mlXgZon7guxP8gva8pthCyVJvpT7Nzi0KtmYjL2hM8KBEdlh3
 1s5DvwfumCueI4FcTuCa8oOE5D6wQU2qgATLrZ5LJ1yOOQyHkMtXwPhWUNXR9XgpalG3
 Yq5At0D0l2eeOnwBIyhtvHF9x4kVzrviFuRd7O2V8dRkGt0GjQdPKyxP55Bm1B1+EAwJ
 BLRA==
X-Gm-Message-State: APjAAAVu7VoFsJwegVs/Px7TEiS7w3RlGSP0LCNFuATPsXYz7+tlZNLg
 sh1Y3A7gk3rvsEaK66fLome+Tw==
X-Google-Smtp-Source: APXvYqykbhyRJTMPg2X1qCxbze7KeCvPjRznDrcy0Fg2mCXVuy99zu3F+VmTBTAfD4KCEl+4JMIpwA==
X-Received: by 2002:a81:99d0:: with SMTP id
 q199mr14154630ywg.154.1557277971997; 
 Tue, 07 May 2019 18:12:51 -0700 (PDT)
Received: from ?IPv6:2600:1700:3580:6630:ca4:cc88:c0c8:f0bc?
 ([2600:1700:3580:6630:ca4:cc88:c0c8:f0bc])
 by smtp.gmail.com with ESMTPSA id z204sm4153234ywb.28.2019.05.07.18.12.50
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 07 May 2019 18:12:50 -0700 (PDT)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (1.0)
Subject: Re: ZFS...
From: Joe Maloney <jmaloney@ixsystems.com>
X-Mailer: iPhone Mail (16E227)
In-Reply-To: <a1b78a63-0ef1-af51-4e33-a9a97a257c8b@sorbs.net>
Date: Tue, 7 May 2019 21:12:50 -0400
Cc: Karl Denninger <karl@denninger.net>, freebsd-stable@freebsd.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <BA5AC6FC-246A-47DC-B4D9-16106B2C5FB7@ixsystems.com>
References: <30506b3d-64fb-b327-94ae-d9da522f3a48@sorbs.net>
 <CAOtMX2gf3AZr1-QOX_6yYQoqE-H+8MjOWc=eK1tcwt5M3dCzdw@mail.gmail.com>
 <56833732-2945-4BD3-95A6-7AF55AB87674@sorbs.net>
 <3d0f6436-f3d7-6fee-ed81-a24d44223f2f@netfence.it>
 <17B373DA-4AFC-4D25-B776-0D0DED98B320@sorbs.net>
 <70fac2fe3f23f85dd442d93ffea368e1@ultra-secure.de>
 <70C87D93-D1F9-458E-9723-19F9777E6F12@sorbs.net>
 <CAGMYy3tYqvrKgk2c==WTwrH03uTN1xQifPRNxXccMsRE1spaRA@mail.gmail.com>
 <5ED8BADE-7B2C-4B73-93BC-70739911C5E3@sorbs.net>
 <d0118f7e-7cfc-8bf1-308c-823bce088039@denninger.net>
 <2e4941bf-999a-7f16-f4fe-1a520f2187c0@sorbs.net>
 <20190430102024.E84286@mulder.mintsol.com>
 <41FA461B-40AE-4D34-B280-214B5C5868B5@punkt.de>
 <20190506080804.Y87441@mulder.mintsol.com>
 <08E46EBF-154F-4670-B411-482DCE6F395D@sorbs.net>
 <33D7EFC4-5C15-4FE0-970B-E6034EF80BEF@gromit.dlib.vt.edu>
 <A535026E-F9F6-4BBA-8287-87EFD02CF207@sorbs.net>
 <a82bfabe-a8c3-fd9a-55ec-52530d4eafff@denninger.net>
 <a1b78a63-0ef1-af51-4e33-a9a97a257c8b@sorbs.net>
To: Michelle Sullivan <michelle@sorbs.net>
X-Rspamd-Queue-Id: 00F2784971
X-Spamd-Bar: ---
Authentication-Results: mx1.freebsd.org;
 dkim=pass header.d=ixsystems-com.20150623.gappssmtp.com header.s=20150623
 header.b=at+zgSBk; 
 dmarc=pass (policy=none) header.from=ixsystems.com;
 spf=pass (mx1.freebsd.org: domain of jmaloney@ixsystems.com designates
 2607:f8b0:4864:20::c43 as permitted sender)
 smtp.mailfrom=jmaloney@ixsystems.com
X-Spamd-Result: default: False [-3.61 / 15.00]; ARC_NA(0.00)[];
 NEURAL_HAM_MEDIUM(-1.00)[-1.000,0];
 R_DKIM_ALLOW(-0.20)[ixsystems-com.20150623.gappssmtp.com:s=20150623];
 FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3];
 R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; MV_CASE(0.50)[];
 MIME_GOOD(-0.10)[text/plain];
 PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org];
 NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_DN_SOME(0.00)[];
 RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[];
 DKIM_TRACE(0.00)[ixsystems-com.20150623.gappssmtp.com:+];
 MX_GOOD(-0.01)[cached: ALT3.ASPMX.L.GOOGLE.com];
 DMARC_POLICY_ALLOW(-0.50)[ixsystems.com,none];
 SUBJ_ALL_CAPS(0.45)[6];
 RCVD_IN_DNSWL_NONE(0.00)[3.4.c.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org
 : 127.0.5.0]; NEURAL_HAM_SHORT(-0.78)[-0.785,0];
 FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+];
 RCVD_TLS_LAST(0.00)[];
 ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US];
 MID_RHS_MATCH_FROM(0.00)[];
 IP_SCORE(-0.77)[ip: (1.69), ipnet: 2607:f8b0::/32(-3.22), asn: 15169(-2.26),
 country: US(-0.06)]
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 May 2019 01:12:54 -0000

You might look at UFS Explorer.  It claims to have ZFS support now.  It cost=
s money for a license and I think required windows last I used it.  I can at=
test that a previous version allowed me to recover all the data I needed fro=
m a lost UFS mirror almost a decade ago.

Sent from my iPhone

> On May 7, 2019, at 9:01 PM, Michelle Sullivan <michelle@sorbs.net> wrote:
>=20
> Karl Denninger wrote:
>>> On 5/7/2019 00:02, Michelle Sullivan wrote:
>>> The problem I see with that statement is that the zfs dev mailing lists c=
onstantly and consistently following the line of, the data is always right t=
here is no need for a =E2=80=9Cfsck=E2=80=9D (which I actually get) but it=E2=
=80=99s used to shut down every thread... the irony is I=E2=80=99m now insta=
lling windows 7 and SP1 on a usb stick (well it=E2=80=99s actually installed=
, but sp1 isn=E2=80=99t finished yet) so I can install a zfs data recovery t=
ool which reports to be able to =E2=80=9Cwalk the data=E2=80=9D to retrieve a=
ll the files...  the irony eh... install windows7 on a usb stick to recover a=
 FreeBSD installed zfs filesystem...  will let you know if the tool works, b=
ut as it was recommended by a dev I=E2=80=99m hopeful... have another array (=
with zfs I might add) loaded and ready to go... if the data recovery is succ=
essful I=E2=80=99ll blow away the original machine and work out what OS and d=
rive setup will be safe for the data in the future.  I might even put FreeBS=
D and zfs back on it, but if I do it won=E2=80=99t be in the current Zraid2 c=
onfig.
>> Meh.
>>=20
>> Hardware failure is, well, hardware failure.  Yes, power-related
>> failures are hardware failures.
>>=20
>> Never mind the potential for /software /failures.  Bugs are, well,
>> bugs.  And they're a real thing.  Never had the shortcomings of UFS bite
>> you on an "unexpected" power loss?  Well, I have.  Is ZFS absolutely
>> safe against any such event?  No, but it's safe*r*.
>=20
> Yes and no ... I'll explain...
>=20
>>=20
>> I've yet to have ZFS lose an entire pool due to something bad happening,
>> but the same basic risk (entire filesystem being gone)
>=20
> Everytime I have seen this issue (and it's been more than once - though un=
til now recoverable - even if extremely painful) - its always been during a r=
esilver of a failed drive and something happening... panic, another drive fa=
ilure, power etc.. any other time its rock solid... which is the yes and no.=
.. under normal circumstances zfs is very very good and seems as safe as or s=
afer than UFS... but my experience is ZFS has one really bad flaw.. if there=
 is a corruption in the metadata - even if the stored data is 100% correct -=
 it will fault the pool and thats it it's gone barring some luck and painful=
 recovery (backups aside) ... this other file systems also suffer but there a=
re tools that *majority of the time* will get you out of the s**t with littl=
e pain.  Barring this windows based tool I haven't been able to run yet, zfs=
 appears to have nothing.
>=20
>> has occurred more
>> than once in my IT career with other filesystems -- including UFS, lowly
>> MSDOS and NTFS, never mind their predecessors all the way back to floppy
>> disks and the first 5Mb Winchesters.
>=20
> Absolutely, been there done that.. and btrfs...*ouch* still as bad.. howev=
er with the only one btrfs install I had (I didn't knopw it was btrfs undern=
eath, but netgear NAS...) I was still able to recover the data even though i=
t had screwed the file system so bad I vowed never to consider or use it aga=
in on anything ever...
>=20
>>=20
>> I learned a long time ago that two is one and one is none when it comes
>> to data, and WHEN two becomes one you SWEAT, because that second failure
>> CAN happen at the worst possible time.
>=20
> and does..
>=20
>>=20
>> As for RaidZ2 .vs. mirrored it's not as simple as you might think.
>> Mirrored vdevs can only lose one member per mirror set, unless you use
>> three-member mirrors.  That sounds insane but actually it isn't in
>> certain circumstances, such as very-read-heavy and high-performance-read
>> environments.
>=20
> I know - this is why I don't use mirrored - because wear patterns will ens=
ure both sides of the mirror are closely matched.
>=20
>>=20
>> The short answer is that a 2-way mirrored set is materially faster on
>> reads but has no acceleration on writes, and can lose one member per
>> mirror.  If the SECOND one fails before you can resilver, and that
>> resilver takes quite a long while if the disks are large, you're dead.
>> However, if you do six drives as a 2x3 way mirror (that is, 3 vdevs each
>> of a 2-way mirror) you now have three parallel data paths going at once
>> and potentially six for reads -- and performance is MUCH better.  A
>> 3-way mirror can lose two members (and could be organized as 3x2) but
>> obviously requires lots of drive slots, 3x as much *power* per gigabyte
>> stored (and you pay for power twice; once to buy it and again to get the
>> heat out of the room where the machine is.)
>=20
> my problem (as always) is slots not so much the power.
>=20
>>=20
>> Raidz2 can also lose 2 drives without being dead.  However, it doesn't
>> get any of the read performance improvement *and* takes a write
>> performance penalty; Z2 has more write penalty than Z1 since it has to
>> compute and write two parity entries instead of one, although in theory
>> at least it can parallel those parity writes -- albeit at the cost of
>> drive bandwidth congestion (e.g. interfering with other accesses to the
>> same disk at the same time.)  In short RaidZx performs about as "well"
>> as the *slowest* disk in the set.
> Which is why I built mine with identical drives (though different producti=
on batches :) ) ... majority of the data in my storage array is write once (=
or twice) read many.
>=20
>>   So why use it (particularly Z2) at
>> all?  Because for "N" drives you get the protection of a 3-way mirror
>> and *much* more storage.  A six-member RaidZ2 setup returns ~4Tb of
>> usable space, where with a 2-way mirror it returns 3Tb and a 3-way
>> mirror (which provides the same protection against drive failure as Z2)
>> you have only *half* the storage.  IMHO ordinary Raidz isn't worth the
>> trade-offs, but Z2 frequently is.
>>=20
>> In addition more spindles means more failures, all other things being
>> equal, so if you need "X" TB of storage and organize it as 3-way mirrors
>> you now have twice as many physical spindles which means on average
>> you'll take twice as many faults.  If performance is more important then
>> the choice is obvious.  If density is more important (that is, a lot or
>> even most of the data is rarely accessed at all) then the choice is
>> fairly simple too.  In many workloads you have some of both, and thus
>> the correct choice is a hybrid arrangement; that's what I do here,
>> because I have a lot of data that is rarely-to-never accessed and
>> read-only but also have some data that is frequently accessed and
>> frequently written.  One size does not fit all in such a workload.
> This is where I came to 2 systems (with different data) .. one was for den=
sity, the other performance.  Storage vs working etc..
>=20
>> MOST systems, by the way, have this sort of paradigm (a huge percentage
>> of the data is rarely read and never written) but it doesn't become
>> economic or sane to try to separate them until you get well into the
>> terabytes of storage range and a half-dozen or so physical volumes.
>> There's a  very clean argument that prior to that point but with greater
>> than one drive mirrored is always the better choice.
>>=20
>> Note that if you have an *adapter* go insane (and as I've noted here
>> I've had it happen TWICE in my IT career!) then *all* of the data on the
>> disks served by that adapter is screwed.
>=20
> 100% with you - been there done that... and it doesn't matter what os or f=
ilesystem, hardware failure where silent data corruption happens because of a=
n adapter will always take you out (and zfs will not save you in many cases o=
f that either.)
>>=20
>> It doesn't make a bit of difference what filesystem you're using in that
>> scenario and thus you had better have a backup scheme and make sure it
>> works as well, never mind software bugs or administrator stupidity ("dd"
>> as root to the wrong target, for example, will reliably screw you every
>> single time!)
>>=20
>> For a single-disk machine ZFS is no *less* safe than UFS and provides a
>> number of advantages, with arguably the most-important being easily-used
>> snapshots.
>=20
> Depends in normal operating I agree... but when it comes to all or nothing=
, that is a matter of perspective.  Personally I prefer to have in place rec=
overy options and/or multiple *possible* recovery options rather than ... "d=
estroy the pool and recreate it from scratch, hope you have backups"...
>=20
>>   Not only does this simplify backups since coherency during
>> the backup is never at issue and incremental backups become fast and
>> easily-done in addition boot environments make roll-forward and even
>> *roll-back* reasonable to implement for software updates -- a critical
>> capability if you ever run an OS version update and something goes
>> seriously wrong with it.  If you've never had that happen then consider
>> yourself blessed;
>=20
> I have been there (especially in the early days (pre 0.83 kernel) versions=
 of Linux :) )
>=20
>>  it's NOT fun to manage in a UFS environment and often
>> winds up leading to a "restore from backup" scenario.  (To be fair it
>> can be with ZFS too if you're foolish enough to upgrade the pool before
>> being sure you're happy with the new OS rev.)
>>=20
> Actually I have a simple way with UFS (and ext2/3/4 etc) ... split the boo=
t disk almost down the center.. create 3 partitions.. root, swap, altroot.  r=
oot and altroot are almost identical, one is always active, new OS goes on t=
he other, switch to make the other active (primary) when you've tested... it=
's only gives one level of roll forward/roll back, but it works for me and h=
as never failed (boot disk/OS wise) since I implemented it... but then I don=
't let anyone else in the company have root access so they cannot dd or "rm -=
r . /" or "rm -r .*" (both of which are the only way I have done that before=
 - back in 1994 and never done it since - its something you learn or get out=
 of IT :P .. and for those who didn't get the latter it should have been 'rm=
 -r .??*' - and why are you on '-stable' ...? :P )
>=20
> Regards,
>=20
> --=20
> Michelle Sullivan
> http://www.mhix.org/
>=20
>=20
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"