Date: Sun, 20 Jan 2013 14:34:59 -0800 From: Michael DeMan <freebsd@deman.com> To: freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: ZFS regimen: scrub, scrub, scrub and scrub again. Message-ID: <DE0139FC-298B-4CC9-8FBB-9A2F345AB8D3@deman.com> In-Reply-To: <CACpH0Mf6sNb8JOsTzC%2BWSfQRB62%2BZn7VtzEnihEKmEV2aO2p%2Bw@mail.gmail.com> References: <CACpH0Mf6sNb8JOsTzC%2BWSfQRB62%2BZn7VtzEnihEKmEV2aO2p%2Bw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
+1 on being able to 'pause scrubs' - that would be awesome. - Mike On Jan 20, 2013, at 2:26 PM, Zaphod Beeblebrox <zbeeble@gmail.com> wrote: > Please don't misinterpret this post: ZFS's ability to recover from fairly > catastrophic failures is pretty stellar, but I'm wondering if there can be > a little room for improvement. > > I use RAID pretty much everywhere. I don't like to loose data and disks > are cheap. I have a fair amount of experience with all flavors ... and ZFS > has become a go-to filesystem for most of my applications. > > One of the best recommendations I can give for ZFS is it's > crash-recoverability. As a counter example, if you have most hardware RAID > going or a software whole-disk raid, after a crash it will generally > declare one disk as good and the other disk as "to be repaired" ... after > which a full surface scan of the affected disks --- reading one and writing > the other --- ensues. On my Windows desktop, the pair of 2T's take 3 or 4 > hours to do this. A pair of green 2T's can take over 6. You don't loose > any data, but you have severely reduced performance until it's repaired. > > The rub is that you know only one or two blocks could possibly even be > different ... and that this is a highly unoptimized way of going about the > problem. > > ZFS is smart on this point: it will recover on reboot with a minimum amount > of fuss. Even if you dislodge a drive ... so that it's missing the last > 'n' transactions, ZFS seems to figure this out (which I thought was extra > cudos). > > MY PROBLEM comes from problems that scrub can fix. > > Let's talk, in specific, about my home array. It has 9x 1.5T and 8x 2T in > a RAID-Z configuration (2 sets, obviously). The drives themselves are > housed (4 each) in external drive bays with a single SATA connection for > each. I think I have spoken of this here before. > > A full scrub of my drives weighs in at 36 hours or so. > > Now around Christmas, while moving some things, I managed to pull the plug > on one cabinet of 4 drives. It was likely that the only active use of the > filesystem was an automated cvs checkin (backup) given that the errors only > appeared on the cvs directory. > > IN-THE-END, no data was lost, but I had to scrub 4 times to remove the > complaints, which showed like this from "zpool status -v" > > errors: Permanent errors have been detected in the following files: > > vr2/cvs:<0x1c1> > > Now ... this is just an example: after each scrub, the hex number was > different. I also couldn't actually find the error on the cvs filesystem, > as a side note. Not many files are stored there, and they all seemed to be > present. > > MY TAKEAWAY from this is that 2 major improvements could be made to ZFS: > > 1) a pause for scrub... such that long scrubs could be paused during > working hours. > > 2) going back over errors... during each scrub, the "new" error was found > before the old error was cleared. Then this new error gets similarly > cleared by the next scrub. It seems that if the scrub returned to this new > found error after fixing the "known" errors, this could save whole new > scrub runs from being required. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?DE0139FC-298B-4CC9-8FBB-9A2F345AB8D3>