From owner-freebsd-stable@freebsd.org Tue Apr 30 00:41:25 2019 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D1A0C15A23DC for ; Tue, 30 Apr 2019 00:41:24 +0000 (UTC) (envelope-from michelle@sorbs.net) Received: from hades.sorbs.net (hades.sorbs.net [72.12.213.40]) by mx1.freebsd.org (Postfix) with ESMTP id ABF8D84AA5; Tue, 30 Apr 2019 00:41:22 +0000 (UTC) (envelope-from michelle@sorbs.net) MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Received: from [10.10.0.230] (gate.mhix.org [203.206.128.220]) by hades.sorbs.net (Oracle Communications Messaging Server 7.0.5.29.0 64bit (built Jul 9 2013)) with ESMTPSA id <0PQR0020C181Y400@hades.sorbs.net>; Mon, 29 Apr 2019 17:55:17 -0700 (PDT) Subject: Re: ZFS... From: Michelle Sullivan X-Mailer: iPad Mail (16A404) In-reply-to: Date: Tue, 30 Apr 2019 10:41:17 +1000 Cc: freebsd-stable Content-transfer-encoding: quoted-printable Message-id: <56833732-2945-4BD3-95A6-7AF55AB87674@sorbs.net> References: <30506b3d-64fb-b327-94ae-d9da522f3a48@sorbs.net> To: Alan Somers X-Rspamd-Queue-Id: ABF8D84AA5 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of michelle@sorbs.net designates 72.12.213.40 as permitted sender) smtp.mailfrom=michelle@sorbs.net X-Spamd-Result: default: False [-3.92 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+a:hades.sorbs.net]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[sorbs.net]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; IP_SCORE(-1.30)[ip: (-3.65), ipnet: 72.12.192.0/19(-1.60), asn: 11114(-1.20), country: US(-0.06)]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: battlestar.sorbs.net]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[40.213.12.72.list.dnswl.org : 127.0.10.0]; SUBJ_ALL_CAPS(0.45)[6]; NEURAL_HAM_SHORT(-0.85)[-0.854,0]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:11114, ipnet:72.12.192.0/19, country:US]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Apr 2019 00:41:25 -0000 Comments inline.. Michelle Sullivan http://www.mhix.org/ Sent from my iPad > On 30 Apr 2019, at 03:06, Alan Somers wrote: >=20 >> On Mon, Apr 29, 2019 at 10:23 AM Michelle Sullivan w= rote: >>=20 >> I know I'm not going to be popular for this, but I'll just drop it here >> anyhow. >>=20 >> http://www.michellesullivan.org/blog/1726 >>=20 >> Perhaps one should reconsider either: >>=20 >> 1. Looking at tools that may be able to recover corrupt ZFS metadata, or >> 2. Defaulting to non ZFS filesystems on install. >>=20 >> -- >> Michelle Sullivan >> http://www.mhix.org/ >=20 > Wow, losing multiple TB sucks for anybody. I'm sorry for your loss. > But I want to respond to a few points from the blog post. >=20 > 1) When ZFS says that "the data is always correct and there's no need > for fsck", they mean metadata as well as data. The spacemap is > protected in exactly the same way as all other data and metadata. (to > be pedantically correct, the labels and uberblocks are protected in a > different way, but still protected). The only way to get metadata > corruption is due a disk failure (3-disk failure when using RAIDZ2), > or due to a software bug. Sadly, those do happen, and they're > devilishly tricky to track down. The difference between ZFS and older > filesystems is that older filesystems experience corruption during > power loss _by_design_, not merely due to software bugs. A perfectly > functioning UFS implementation will experience corruption during power > loss, and that's why it needs to be fscked. It's not just > theoretical, either. I use UFS on my development VMs, and they > frequently experience corruption after a panic (which happens all the > time because I'm working on kernel code). I know, which is why I have ZVOLs with UFS filesystems in them for the devel= opment VMs... in a perfect world the power would have been all good, the up= ses would not be damaged and the generator would not run out of fuel because= of extended outage... in fact if it was a perfect world I wouldn=E2=80=99t= have my own mini dc at home. >=20 > 2) Backups are essential with any filesystem, not just ZFS. After > all, no amount of RAID will protect you from an accidental "rm -rf /". You only do it once... I did it back in 1995... haven=E2=80=99t ever done i= t again. >=20 > 3) ZFS hotspares can be swapped in automatically, though they don't be > default. It sounds like you already figured out how to assign a spare > to the pool. To use it automatically, you must set the "autoreplace" > pool property and enable zfsd. The latter can be done with "sysrc > zfsd_enable=3D"YES"". The system was originally built on 9.0, and got upgraded through out the yea= rs... zfsd was not available back then. So get your point, but maybe you di= dn=E2=80=99t realize this blog was a history of 8+ years? >=20 > 4) It sounds like you're having a lot of power trouble. Have you > tried sysutils/apcupsd from ports? I did... Malta was notorious for it. Hence 6kva upses in the bottom of each= rack (4 racks), cross connected with the rack next to it and a backup gener= ator... Australia on the otherhand is a lot more stable (at least where I a= m)... 2 power issues in 2 years... both within 10 hours... one was a transf= ormer, the other when some idiot took out a power pole (and I mean actually t= ook it out, it was literally snapped in half... how they got out of the car a= nd did a runner before the police or Ambos got there I=E2=80=99ll never know= .) > It's fairly handy. It can talk to > a wide range of UPSes, and can be configured to do stuff like send you > an email on power loss, and power down the server if the battery gets > too low. >=20 They could help this... all 4 upses are toast now. One caught fire, one no l= onger detects AC input, the other two I=E2=80=99m not even trying after the f= irst catching fire... the lot are being replaced on insurance. It=E2=80=99s a catalog of errors that most wouldn=E2=80=99t normally experie= nce. However it does show (to me) that ZFS on everything is a really bad id= ea... particularly for home users where there is unknown hardware and you kn= ow they will mistreat it... they certainly won=E2=80=99t have ECC RAM in lap= tops etc... unknown caching facilities etc.. it=E2=80=99s a recipe for losin= g the root drive... Regards, Michelle=