Date: Mon, 21 Jan 2013 12:12:45 +0100 (CET) From: Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl> To: Zaphod Beeblebrox <zbeeble@gmail.com> Cc: freebsd-fs <freebsd-fs@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: ZFS regimen: scrub, scrub, scrub and scrub again. Message-ID: <alpine.BSF.2.00.1301211201570.9447@wojtek.tensor.gdynia.pl> In-Reply-To: <CACpH0Mf6sNb8JOsTzC%2BWSfQRB62%2BZn7VtzEnihEKmEV2aO2p%2Bw@mail.gmail.com> References: <CACpH0Mf6sNb8JOsTzC%2BWSfQRB62%2BZn7VtzEnihEKmEV2aO2p%2Bw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> Please don't misinterpret this post: ZFS's ability to recover from fairly > catastrophic failures is pretty stellar, but I'm wondering if there can be from my testing it is exactly opposite. You have to see a difference between marketing and reality. > a little room for improvement. > > I use RAID pretty much everywhere. I don't like to loose data and disks > are cheap. I have a fair amount of experience with all flavors ... and ZFS just like me. And because i want performance and - as you described - disks are cheap - i use RAID-1 (gmirror). > has become a go-to filesystem for most of my applications. My applications doesn't tolerate low performance, overcomplexity and high risk of data loss. That's why i use properly tuned UFS, gmirror, and prefer not to use gstripe but have multiple filesystems > One of the best recommendations I can give for ZFS is it's > crash-recoverability. Which is marketing, not truth. If you want bullet-proof recoverability, UFS beats everything i've ever seen. If you want FAST crash recovery, use softupdates+journal, available in FreeBSD 9. > As a counter example, if you have most hardware RAID > going or a software whole-disk raid, after a crash it will generally > declare one disk as good and the other disk as "to be repaired" ... after > which a full surface scan of the affected disks --- reading one and writing > the other --- ensues. true. gmirror do it, but you can defer mirror rebuild, which i use. I have a script that send me a mail when gmirror is degraded, and i - after finding out the cause of problem, and possibly replacing disk - run rebuild after work hours, so no slowdown is experienced. > ZFS is smart on this point: it will recover on reboot with a minimum amount > of fuss. Even if you dislodge a drive ... so that it's missing the last > 'n' transactions, ZFS seems to figure this out (which I thought was extra > cudos). Yes this is marketing. practice is somehow different. as you discovered yourself. > > MY PROBLEM comes from problems that scrub can fix. > > Let's talk, in specific, about my home array. It has 9x 1.5T and 8x 2T in > a RAID-Z configuration (2 sets, obviously). While RAID-Z is already a king of bad performance, i assume you mean two POOLS, not 2 RAID-Z sets. if you mixed 2 different RAID-Z pools you would spread load unevenly and make performance even worse. > > A full scrub of my drives weighs in at 36 hours or so. which is funny as ZFS is marketed as doing this efficient (like checking only used space). dd if=/dev/disk of=/dev/null bs=2m would take no more than a few hours. and you may do all in parallel. > vr2/cvs:<0x1c1> > > Now ... this is just an example: after each scrub, the hex number was seems like scrub simply not do it's work right. > before the old error was cleared. Then this new error gets similarly > cleared by the next scrub. It seems that if the scrub returned to this new > found error after fixing the "known" errors, this could save whole new > scrub runs from being required. Even better - use UFS. For both bullet proof recoverability and performance. If you need help in tuning you may ask me privately.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1301211201570.9447>