Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Jul 1998 06:56:10 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        grog@lemis.com (Greg Lehey)
Cc:        wilko@yedi.iaf.nl, tlambert@primenet.com, gibbs@plutotech.com, andre@pipeline.ch, Matthew.Alton@anheuser-busch.com, Hackers@FreeBSD.ORG
Subject:   Re: Software RAID-5 performance
Message-ID:  <199807150656.XAA08080@usr06.primenet.com>
In-Reply-To: <19980715094757.P15083@freebie.lemis.com> from "Greg Lehey" at Jul 15, 98 09:47:57 am

next in thread | previous in thread | raw e-mail | index | archive | help
> > You now have an inconsistent raid5 set.
> 
> Correct.  Similar things happen if a disk loses power while writing,
> but the window is larger for RAID-5, and it's much more difficult to
> detect.  There are a number of "solutions", of course:
> 
> 1.  Intent logging.  Save some copy of the data elsewhere first.
>     Slow.

You can actually "containerize" this.  You end up with something that
looks a lot like IBM's JFS.

The basic idea is that you create a "container" that points to the
old object; then you update the new object to a new location, rewrite
the container, and free up the old object.

This is currently how you can implement multi-record transactions in
databases that don't support exporting a transaction mechanis,.

This approach is useful, for example, if you want to have implied
state between a data file and an index file, but your system does
not guarantee metadata updates to be ordered (either soft updates,
delayed ordered writes, or synchronus writes).  For example, on
EXT2FS.

The recent Informix port to Linux had me seriously wondering how
they would address this issue.  I suspect that they do a system-wide
"sync(2)" when they do directory entry manipulation, and fsync(2) before
committing the container data.  A little slower than it has to be, if
run under FreeBSD under Linux emulation, but computationally sound.


> 2.  Battery backup.  Doesn't guard against panics and non-disk
>     hardware failure.

No, but it's a hell of a lot faster as an intention log, and it
resolves most of your issues.  If written correctly, it supports
transaction rolll-forward (ie: you do the RAM update, mark it valid,
and commit it in the background, removing it only after the commit
is verified -- exactly how PrestoServe(tm) does it...).


> 3.  As long as the disks didn't physically fail, rebuild the RAID-5
>     set after rebooting.
> 
> None of these is nice.

Unfortuantely, if you relied on this last approach, you wouldn't be
able to tell a soft failutre from a hard failure.  VXFS (Veritas) on
UnixWare used to have this problem; it assumed that all soft failures
would be resolved transparently (an incorrect assumption).  On slow IDE
disks, the orginal 1.0 release had a habit of eating "/usr" and marking
it bad.  Unfortunately, you could undo this without a low level format.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199807150656.XAA08080>