Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 6 May 2007 15:00:40 +0200
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        Dag-Erling Sm??rgrav <des@des.no>
Cc:        Ivan Voras <ivoras@fer.hr>, freebsd-geom@freebsd.org
Subject:   Re: graid5 after-reboot problem
Message-ID:  <20070506130040.GB2138@garage.freebsd.pl>
In-Reply-To: <867irmqntm.fsf@dwp.des.no>
References:  <171980743.20070504223126@uzvik.kiev.ua> <125507.38194.qm@web30304.mail.mud.yahoo.com> <f1i3s4$j4n$1@sea.gmane.org> <86fy6bqocr.fsf@dwp.des.no> <20070505233053.GE16398@garage.freebsd.pl> <867irmqntm.fsf@dwp.des.no>

next in thread | previous in thread | raw e-mail | index | archive | help

--kXdP64Ggrk/fb43R
Content-Type: text/plain; charset=iso-8859-2
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, May 06, 2007 at 11:54:45AM +0200, Dag-Erling Sm??rgrav wrote:
> Pawel Jakub Dawidek <pjd@FreeBSD.org> writes:
> > RAID3 is also write-hole safe, btw.
>=20
> How?  Any write to a RAID3 requires writing the data to one of the
> data disks *and* updating the parity disk.

The "write hole" problem is so important in RAID5, because RAID5  parity
block to update data block. There are few stages of writting a block in
RAID5:

1. Read old content of the block you want to write.
2. Read corresponding parity block.
3. XOR parity with old content.
4. XOR parity with new content.
5. Write new content.
6. Write parity.

(This could be done by avoiding parity and reading all corresponding
data block, but it's way too inefficient, so this short-cut is most
popular.)

When you lose the power between 5 and 6, you parity will be corrupted
and will stay corrupted forever, because none of the further writes will
update it correctly (the only exception is when you do full stripe
write, then you don't read old parity, just calculate it, because you
have all data blocks needed).

This is so much different in RAID3. In RAID3 you always do full stripe
writes, so it looks like this:

1. Write data to all data disks and parity disk at once.

Of course 1 is not atomic, but when you have a power failure, graid3
will synchronize parity component, but even if you decide not to do it,
next write to this block will fix inconsistency, which is not the case
for RAID5. RAIDZ also does full stripe writes, just like RAID3, but its
COW model is what gives always consistent data and not full stripe
writes.

Also note, that using gjournal on top of graid3 will fix non-atomicity,
but gjournal on top of RAID5 won't fix RAID5 non-atomicity.

All in all, write hole is not that dangerous if you remember to
synchronize parity on unclean shutdown and this is need for RAID5,
RAID3, RAID1, RAID4, RAID6, etc. for RAID5 it is just most visible and
you can't avoid resynchronization even when you use things like
gjournal.

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--kXdP64Ggrk/fb43R
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (FreeBSD)

iD8DBQFGPdF4ForvXbEpPzQRAiacAJwOTjE7x85KElucTySsIlGeWwiZPACglyB1
7Nj1cE5shScEvCIhtxufcv8=
=g7c8
-----END PGP SIGNATURE-----

--kXdP64Ggrk/fb43R--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070506130040.GB2138>