Date: Sun, 19 Jul 1998 10:11:02 -0400 From: Vincent Fleming <vincef@penmax.com> To: Greg Lehey <grog@lemis.com> Cc: Wilko Bulte <wilko@yedi.iaf.nl>, tlambert@primenet.com, gibbs@plutotech.com, andre@pipeline.ch, Matthew.Alton@anheuser-busch.com, Hackers@FreeBSD.ORG Subject: Re: Software RAID-5 performance Message-ID: <35B1FE76.21544D2E@penmax.com> References: <19980715094757.P15083@freebie.lemis.com> <199807152003.WAA03283@yedi.iaf.nl> <19980719163859.H435@freebie.lemis.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Greg Lehey wrote: > On Wednesday, 15 July 1998 at 22:03:32 +0200, Wilko Bulte wrote: > > As Greg Lehey wrote... > >> On Tuesday, 14 July 1998 at 20:05:16 +0200, Wilko Bulte wrote: > >> 3. As long as the disks didn't physically fail, rebuild the RAID-5 > >> set after rebooting. > > > > I don't think this solves it, as you don't know which block is up to date > > and which block is not. Or do I miss your point? > > Well, you'd have to have some convention like writing the data block > before the parity block. Then you could assume that if you found a > parity error, the parity block would be wrong, so you could fix it. > Of course, any other consistent assumption would work as well, but > having the parity block written last would mean you could bring up the > array read-only while you were rebuilding it. > Actually, you MUST write the data block before the parity block to be RAB (RAID Advisory Board) RAID-5 compliant. You've sited the reason for it yourself; ability to rebuild. A 'hack' we put in our boxes that can be done in software (we've patented it, but I don't think anyone would care about FreeBSD borrowing the idea) is to implement 'commit' and 'rollback' features, much like a database. We don't log individual block numbers being updated, but instead cut the RAID group into sections of stripes. When you go to perform an update on a stripe, you mark it's section 'suspect'. Occasionally, clear the 'suspect' list when activity decreases. We do this with NVRAM, but you could use a chunk of a disk as well - it'll be slower, but not as slow as a log. Anyway, you should always write the data blocks before the partity. Anyway, when a system failure occurs, and you go to rebuild the RAID group, you look at the suspect list and rebuild just those sections affected. It's a LOT faster than rebuilding the entire group, particularly when using 9 & 18 GB drives. If you always write the data block before the parity, there's no need to bring it online read-only while rebuilding. What's the worst that can happen? You update a block with bad parity, generating more bad parity. So? When the rebuild gets to that stripe, it'll recalc the parity anyway. See? Vince Fleming Director, Advanced Solutions Group ECCS, Inc - Makers of High Performance RAID Subsystems vincef@eccs.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?35B1FE76.21544D2E>