Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 19 Jul 1998 10:11:02 -0400
From:      Vincent Fleming <vincef@penmax.com>
To:        Greg Lehey <grog@lemis.com>
Cc:        Wilko Bulte <wilko@yedi.iaf.nl>, tlambert@primenet.com, gibbs@plutotech.com, andre@pipeline.ch, Matthew.Alton@anheuser-busch.com, Hackers@FreeBSD.ORG
Subject:   Re: Software RAID-5 performance
Message-ID:  <35B1FE76.21544D2E@penmax.com>
References:  <19980715094757.P15083@freebie.lemis.com> <199807152003.WAA03283@yedi.iaf.nl> <19980719163859.H435@freebie.lemis.com>

next in thread | previous in thread | raw e-mail | index | archive | help


Greg Lehey wrote:

> On Wednesday, 15 July 1998 at 22:03:32 +0200, Wilko Bulte wrote:
> > As Greg Lehey wrote...
> >> On Tuesday, 14 July 1998 at 20:05:16 +0200, Wilko Bulte wrote:
> >> 3.  As long as the disks didn't physically fail, rebuild the RAID-5
> >>     set after rebooting.
> >
> > I don't think this solves it, as you don't know which block is up to date
> > and which block is not. Or do I miss your point?
>
> Well, you'd have to have some convention like writing the data block
> before the parity block.  Then you could assume that if you found a
> parity error, the parity block would be wrong, so you could fix it.
> Of course, any other consistent assumption would work as well, but
> having the parity block written last would mean you could bring up the
> array read-only while you were rebuilding it.
>

  Actually, you MUST write the data block before the parity block to
be RAB (RAID Advisory Board) RAID-5 compliant.  You've sited the
reason for it yourself; ability to rebuild.

A 'hack' we put in our boxes that can be done in software (we've patented it,
but I don't think anyone would care about FreeBSD borrowing the idea) is
to implement 'commit' and 'rollback' features, much like a database.  We don't
log individual block numbers being updated, but instead cut the RAID group
into sections of stripes.  When you go to perform an update on a stripe, you
mark it's section 'suspect'.  Occasionally, clear the 'suspect' list when
activity
decreases.  We do this with NVRAM, but you could use a chunk of a disk
as well - it'll be slower, but not as slow as a log.  Anyway, you should always

write the data blocks before the partity.

Anyway, when a system failure occurs, and you go to rebuild the RAID group,
you look at the suspect list and rebuild just those sections affected.  It's a
LOT
faster than rebuilding the entire group, particularly when using 9 & 18 GB
drives.

If you always write the data block before the parity, there's no need to bring
it
online read-only while rebuilding.  What's the worst that can happen?  You
update
a block with bad parity, generating more bad parity.  So?  When the rebuild
gets
to that stripe, it'll recalc the parity anyway.  See?

Vince Fleming
Director, Advanced Solutions Group
ECCS, Inc - Makers of High Performance RAID Subsystems
vincef@eccs.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?35B1FE76.21544D2E>