From owner-freebsd-stable@freebsd.org Tue Apr 30 23:18:55 2019 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5365215A15A4 for ; Tue, 30 Apr 2019 23:18:55 +0000 (UTC) (envelope-from michelle@sorbs.net) Received: from hades.sorbs.net (hades.sorbs.net [72.12.213.40]) by mx1.freebsd.org (Postfix) with ESMTP id 8FBD26C83A for ; Tue, 30 Apr 2019 23:18:44 +0000 (UTC) (envelope-from michelle@sorbs.net) MIME-version: 1.0 Content-transfer-encoding: 8BIT Content-type: text/plain; charset=UTF-8; format=flowed Received: from isux.com (gate.mhix.org [203.206.128.220]) by hades.sorbs.net (Oracle Communications Messaging Server 7.0.5.29.0 64bit (built Jul 9 2013)) with ESMTPSA id <0PQS002JWS29Y410@hades.sorbs.net> for freebsd-stable@freebsd.org; Tue, 30 Apr 2019 16:32:35 -0700 (PDT) Subject: Re: ZFS... To: Walter Cramer Cc: Karl Denninger , freebsd-stable@freebsd.org References: <30506b3d-64fb-b327-94ae-d9da522f3a48@sorbs.net> <56833732-2945-4BD3-95A6-7AF55AB87674@sorbs.net> <3d0f6436-f3d7-6fee-ed81-a24d44223f2f@netfence.it> <17B373DA-4AFC-4D25-B776-0D0DED98B320@sorbs.net> <70fac2fe3f23f85dd442d93ffea368e1@ultra-secure.de> <70C87D93-D1F9-458E-9723-19F9777E6F12@sorbs.net> <5ED8BADE-7B2C-4B73-93BC-70739911C5E3@sorbs.net> <2e4941bf-999a-7f16-f4fe-1a520f2187c0@sorbs.net> <20190430102024.E84286@mulder.mintsol.com> From: Michelle Sullivan Message-id: Date: Wed, 01 May 2019 09:18:35 +1000 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:51.0) Gecko/20100101 Firefox/51.0 SeaMonkey/2.48 In-reply-to: <20190430102024.E84286@mulder.mintsol.com> X-Rspamd-Queue-Id: 8FBD26C83A X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of michelle@sorbs.net designates 72.12.213.40 as permitted sender) smtp.mailfrom=michelle@sorbs.net X-Spamd-Result: default: False [-2.86 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; R_SPF_ALLOW(-0.20)[+a:hades.sorbs.net]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; DMARC_NA(0.00)[sorbs.net]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[battlestar.sorbs.net,anaconda.sorbs.net,ninja.sorbs.net,catapilla.sorbs.net,scorpion.sorbs.net]; NEURAL_HAM_SHORT(-0.92)[-0.917,0]; RCVD_IN_DNSWL_NONE(0.00)[40.213.12.72.list.dnswl.org : 127.0.10.0]; SUBJ_ALL_CAPS(0.45)[6]; IP_SCORE(-0.69)[ip: (-1.81), ipnet: 72.12.192.0/19(-0.88), asn: 11114(-0.68), country: US(-0.06)]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; CTE_CASE(0.50)[]; ASN(0.00)[asn:11114, ipnet:72.12.192.0/19, country:US]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Apr 2019 23:18:55 -0000 Walter Cramer wrote: > Brief "Old Man" summary/perspective here... > > Computers and hard drives are complex, sensitive physical things. > They, or the data on them, can be lost to fire, flood, lightning > strikes, theft, transportation screw-ups, and more. Mass data > corruption by faulty hardware or software is mostly rare, but does > happen. Then there's the users - authorized or not - who are inept or > malicious. Yup > > You can spent a fortune to make loss of the "live" data in your home > server / server room / data center very unlikely. Is that worth the > time and money? Depends on the business case. At any scale, it's > best to have a manager - who understands both computers and the bottom > line - keep a close eye on this. > That would sorta be my point.. (and yet default FreeBSD install - can't remember wihch version, could be everything current - is to push everything on a filesystem that relies on perfect (or as near as it) hardware where you know, we all know, that the target hardware is consumer grade).. I have 2 machines - almost identical here.. the differences only being the motherboard, CPU and RAM.. the cases are both identical, the PSUs are to, even the drives and controllers are (for the zfs part at least).. Both Supermicro cases with dual psus, both with 16x ST30000VN* drives (Iron Wolf, NAS drives) both with LSI HBAs, both with kingston dual 128GB flash drives mirrored for the base OS. One with 32G non ECC RAM and an onboard RAID for the OS drives, the other with a supermicro board and 16GB ECC RAM and FreeBSD (Geom) mirroring for the OS drive. > "Real" protection from data loss means multiple off-site and generally > off-line backups. You could spend a fortune on that, too...but for > your use case (~21TB in an array that could hold ~39TB, and what > sounds like a "home power user" budget), I'd say to put together two > "backup servers" - cheap little (aka transportable) FreeBSD systems > with, say 7x6GB HD's, raidz1. At the time 6TB ("T" :) ) were not available - 3's and 4's where the top available... it's just rolled on with replacements since. > With even a 1Gbit ethernet connection to your main system, savvy use > of (say) rsync (net/rsync in Ports), and the sort of "know your data / > divide & conquer" tactics that Karl mentions, you should be able to > complete initial backups (on both backup servers) in <1 month. After > that - rsync can generally do incremental backups far, far faster. > How often you gently haul the backup servers to/from your off-site > location(s) depends on a bunch of factors - backup frequency, cost of > bandwidth, etc. 2xbonded Gig connections each server... offsite backup not possible (feasible) due to the Australian mess that they call broadband (12MBps max - running down a hill, on a good day with the wind behind you, and 30 hail Marys every 20 yards) > > Never skimp on power supplies. Hence, dual supermicro, with dual 6 kva HP UPSs with batteries replaced every 36 months and a generator. > > -Walter > > [Credits: Nothing above is original. Others have already made most > of my points in this thread. It's pretty much all decades-old > computer wisdom in any case.] Yup, I know the drill. Michelle > > > On Tue, 30 Apr 2019, Michelle Sullivan wrote: > >> Karl Denninger wrote: >>> On 4/30/2019 05:14, Michelle Sullivan wrote: >>>>> On 30 Apr 2019, at 19:50, Xin LI wrote: >>>>>> On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan >>>>>> >> wrote: >>>>>> but in my recent experience 2 issues colliding at the same time >>>>>> results >> in disaster >>>>> Do we know exactly what kind of corruption happen to your pool? >>>>> If you >> see it twice in a row, it might suggest a software bug that should be >> investigated. >>>>> >>>>> All I know is it’s a checksum error on a meta slab (122) and from >>>>> what >> I can gather it’s the spacemap that is corrupt... but I am no >> expert. I don’t believe it’s a software fault as such, because this >> was cause by a hard outage (damaged UPSes) whilst resilvering a >> single (but completely failed) drive. ...and after the first outage >> a second occurred (same as the first but more damaging to the power >> hardware)... the host itself was not damaged nor were the drives or >> controller. >>> ..... >>>>> Note that ZFS stores multiple copies of its essential metadata, >>>>> and in my >> experience with my old, consumer grade crappy hardware (non-ECC RAM, >> with several faulty, single hard drive pool: bad enough to crash >> almost monthly and damages my data from time to time), >>>> This was a top end consumer grade mb with non ecc ram that had been >> running for 8+ years without fault (except for hard drive platter >> failures.). Uptime would have been years if it wasn’t for patching. >>> Yuck. >>> >>> I'm sorry, but that may well be what nailed you. >>> >>> ECC is not just about the random cosmic ray. It also saves your bacon >>> when there are power glitches. >> >> No. Sorry no. If the data is only half to disk, ECC isn't going to >> save you at all... it's all about power on the drives to complete the >> write. >>> >>> Unfortunately however there is also cache memory on most modern hard >>> drives, most of the time (unless you explicitly shut it off) it's on >>> for >>> write caching, and it'll nail you too. Oh, and it's never, in my >>> experience, ECC. >> >> No comment on that - you're right in the first part, I can't comment >> if there are drives with ECC. >> >>> >>> In addition, however, and this is something I learned a LONG time ago >>> (think Z-80 processors!) is that as in so many very important things >>> "two is one and one is none." >>> >>> In other words without a backup you WILL lose data eventually, and it >>> WILL be important. >>> >>> Raidz2 is very nice, but as the name implies it you have two >>> redundancies. If you take three errors, or if, God forbid, you *write* >>> a block that has a bad checksum in it because it got scrambled while in >>> RAM, you're dead if that happens in the wrong place. >> >> Or in my case you write part data therefore invalidating the checksum... >>> >>>> Yeah.. unlike UFS that has to get really really hosed to restore from >> backup with nothing recoverable it seems ZFS can get hosed where >> issues occur in just the wrong bit... but mostly it is recoverable >> (and my experience has been some nasty shit that always ended up >> being recoverable.) >>>> >>>> Michelle >>> Oh that is definitely NOT true.... again, from hard experience, >>> including (but not limited to) on FreeBSD. >>> >>> My experience is that ZFS is materially more-resilient but there is no >>> such thing as "can never be corrupted by any set of events." >> >> The latter part is true - and my blog and my current situation is not >> limited to or aimed at FreeBSD specifically, FreeBSD is my >> experience. The former part... it has been very resilient, but I >> think (based on this certain set of events) it is easily corruptible >> and I have just been lucky. You just have to hit a certain write to >> activate the issue, and whilst that write and issue might be very >> very difficult (read: hit and miss) to hit in normal every day >> scenarios it can and will eventually happen. >> >>> Backup >>> strategies for moderately large (e.g. many Terabytes) to very large >>> (e.g. Petabytes and beyond) get quite complex but they're also very >>> necessary. >>> >> and there in lies the problem. If you don't have a many 10's of >> thousands of dollars backup solutions, you're either: >> >> 1/ down for a looooong time. >> 2/ losing all data and starting again... >> >> ..and that's the problem... ufs you can recover most (in most >> situations) and providing the *data* is there uncorrupted by the >> fault you can get it all off with various tools even if it is a >> complete mess.... here I am with the data that is apparently ok, but >> the metadata is corrupt (and note: as I had stopped writing to the >> drive when it started resilvering the data - all of it - should be >> intact... even if a mess.) >> >> Michelle >> >> -- >> Michelle Sullivan >> http://www.mhix.org/ >> >> _______________________________________________ >> freebsd-stable@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to >> "freebsd-stable-unsubscribe@freebsd.org" >> -- Michelle Sullivan http://www.mhix.org/