From owner-freebsd-hackers Sun Jul 19 07:12:46 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id HAA17093 for freebsd-hackers-outgoing; Sun, 19 Jul 1998 07:12:46 -0700 (PDT) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from penmax.com (cc595093-a.mdltwn1.nj.home.com [24.3.192.38]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id HAA17088 for ; Sun, 19 Jul 1998 07:12:45 -0700 (PDT) (envelope-from vincef@penmax.com) Received: from penmax.com (rembrandt.penmax.com [10.1.3.2]) by penmax.com (8.8.8/8.8.8) with ESMTP id JAA01086; Sun, 19 Jul 1998 09:59:36 -0400 (EDT) (envelope-from vincef@penmax.com) Message-ID: <35B1FE76.21544D2E@penmax.com> Date: Sun, 19 Jul 1998 10:11:02 -0400 From: Vincent Fleming X-Mailer: Mozilla 4.05 [en] (Win95; I) MIME-Version: 1.0 To: Greg Lehey CC: Wilko Bulte , tlambert@primenet.com, gibbs@plutotech.com, andre@pipeline.ch, Matthew.Alton@anheuser-busch.com, Hackers@FreeBSD.ORG Subject: Re: Software RAID-5 performance References: <19980715094757.P15083@freebie.lemis.com> <199807152003.WAA03283@yedi.iaf.nl> <19980719163859.H435@freebie.lemis.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Greg Lehey wrote: > On Wednesday, 15 July 1998 at 22:03:32 +0200, Wilko Bulte wrote: > > As Greg Lehey wrote... > >> On Tuesday, 14 July 1998 at 20:05:16 +0200, Wilko Bulte wrote: > >> 3. As long as the disks didn't physically fail, rebuild the RAID-5 > >> set after rebooting. > > > > I don't think this solves it, as you don't know which block is up to date > > and which block is not. Or do I miss your point? > > Well, you'd have to have some convention like writing the data block > before the parity block. Then you could assume that if you found a > parity error, the parity block would be wrong, so you could fix it. > Of course, any other consistent assumption would work as well, but > having the parity block written last would mean you could bring up the > array read-only while you were rebuilding it. > Actually, you MUST write the data block before the parity block to be RAB (RAID Advisory Board) RAID-5 compliant. You've sited the reason for it yourself; ability to rebuild. A 'hack' we put in our boxes that can be done in software (we've patented it, but I don't think anyone would care about FreeBSD borrowing the idea) is to implement 'commit' and 'rollback' features, much like a database. We don't log individual block numbers being updated, but instead cut the RAID group into sections of stripes. When you go to perform an update on a stripe, you mark it's section 'suspect'. Occasionally, clear the 'suspect' list when activity decreases. We do this with NVRAM, but you could use a chunk of a disk as well - it'll be slower, but not as slow as a log. Anyway, you should always write the data blocks before the partity. Anyway, when a system failure occurs, and you go to rebuild the RAID group, you look at the suspect list and rebuild just those sections affected. It's a LOT faster than rebuilding the entire group, particularly when using 9 & 18 GB drives. If you always write the data block before the parity, there's no need to bring it online read-only while rebuilding. What's the worst that can happen? You update a block with bad parity, generating more bad parity. So? When the rebuild gets to that stripe, it'll recalc the parity anyway. See? Vince Fleming Director, Advanced Solutions Group ECCS, Inc - Makers of High Performance RAID Subsystems vincef@eccs.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message