From owner-freebsd-hackers  Sun Jul 19 07:12:46 1998
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id HAA17093
          for freebsd-hackers-outgoing; Sun, 19 Jul 1998 07:12:46 -0700 (PDT)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from penmax.com (cc595093-a.mdltwn1.nj.home.com [24.3.192.38])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id HAA17088
          for <Hackers@FreeBSD.ORG>; Sun, 19 Jul 1998 07:12:45 -0700 (PDT)
          (envelope-from vincef@penmax.com)
Received: from penmax.com (rembrandt.penmax.com [10.1.3.2])
	by penmax.com (8.8.8/8.8.8) with ESMTP id JAA01086;
	Sun, 19 Jul 1998 09:59:36 -0400 (EDT)
	(envelope-from vincef@penmax.com)
Message-ID: <35B1FE76.21544D2E@penmax.com>
Date: Sun, 19 Jul 1998 10:11:02 -0400
From: Vincent Fleming <vincef@penmax.com>
X-Mailer: Mozilla 4.05 [en] (Win95; I)
MIME-Version: 1.0
To: Greg Lehey <grog@lemis.com>
CC: Wilko Bulte <wilko@yedi.iaf.nl>, tlambert@primenet.com,
        gibbs@plutotech.com, andre@pipeline.ch,
        Matthew.Alton@anheuser-busch.com, Hackers@FreeBSD.ORG
Subject: Re: Software RAID-5 performance
References: <19980715094757.P15083@freebie.lemis.com> <199807152003.WAA03283@yedi.iaf.nl> <19980719163859.H435@freebie.lemis.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


Greg Lehey wrote:

> On Wednesday, 15 July 1998 at 22:03:32 +0200, Wilko Bulte wrote:
> > As Greg Lehey wrote...
> >> On Tuesday, 14 July 1998 at 20:05:16 +0200, Wilko Bulte wrote:
> >> 3.  As long as the disks didn't physically fail, rebuild the RAID-5
> >>     set after rebooting.
> >
> > I don't think this solves it, as you don't know which block is up to date
> > and which block is not. Or do I miss your point?
>
> Well, you'd have to have some convention like writing the data block
> before the parity block.  Then you could assume that if you found a
> parity error, the parity block would be wrong, so you could fix it.
> Of course, any other consistent assumption would work as well, but
> having the parity block written last would mean you could bring up the
> array read-only while you were rebuilding it.
>

  Actually, you MUST write the data block before the parity block to
be RAB (RAID Advisory Board) RAID-5 compliant.  You've sited the
reason for it yourself; ability to rebuild.

A 'hack' we put in our boxes that can be done in software (we've patented it,
but I don't think anyone would care about FreeBSD borrowing the idea) is
to implement 'commit' and 'rollback' features, much like a database.  We don't
log individual block numbers being updated, but instead cut the RAID group
into sections of stripes.  When you go to perform an update on a stripe, you
mark it's section 'suspect'.  Occasionally, clear the 'suspect' list when
activity
decreases.  We do this with NVRAM, but you could use a chunk of a disk
as well - it'll be slower, but not as slow as a log.  Anyway, you should always

write the data blocks before the partity.

Anyway, when a system failure occurs, and you go to rebuild the RAID group,
you look at the suspect list and rebuild just those sections affected.  It's a
LOT
faster than rebuilding the entire group, particularly when using 9 & 18 GB
drives.

If you always write the data block before the parity, there's no need to bring
it
online read-only while rebuilding.  What's the worst that can happen?  You
update
a block with bad parity, generating more bad parity.  So?  When the rebuild
gets
to that stripe, it'll recalc the parity anyway.  See?

Vince Fleming
Director, Advanced Solutions Group
ECCS, Inc - Makers of High Performance RAID Subsystems
vincef@eccs.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message