Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 01 Nov 2007 23:20:47 -0500
From:      Joe Koberg <joe@rootnode.com>
To:        Ulf Lilleengen <lulf@stud.ntnu.no>
Cc:        Marco Haddad <freebsd-lists@ideo.com.br>, freebsd-geom@freebsd.org
Subject:   Re: gvinum and raid5
Message-ID:  <472AA59F.3020103@rootnode.com>
In-Reply-To: <20071031215756.GB1670@stud.ntnu.no>
References:  <8d4842b50710310814w3880f7d3ldf8abe3a236cbcc8@mail.gmail.com> <20071031215756.GB1670@stud.ntnu.no>

next in thread | previous in thread | raw e-mail | index | archive | help
Ulf Lilleengen wrote:
> On ons, okt 31, 2007 at 12:14:18 -0300, Marco Haddad wrote:
>   
>> I found in recent researchs that a lot of people say gvinum should not be
>> trusted, when it comes to raid5. I began to get worried. Am I alone using
>>
>>     
> I'm working on it, and there are definately people still using it. (I've
> recieved a number of private mails as well as those seen on this list). IMO,
> gvinum can be trusted when it comes to raid5. I've not experienced any
> corruption-bugs or anything like that with it. 
>   

The source of the mistrust may be the fact that few software-only RAID-5 
systems can guarantee write consistency across a multi-drive 
read-update-write cycle in the case of, e.g., power failure.

There is no way for the software RAID to force the parallel writes to 
complete simultaneously on all drives, and from the time the first 
starts until the last is completed, the array is in an inconsistent 
(corrupted) state.

Dedicated RAID hardware solves this with battery-backed RAM that 
maintains the array state in a very robust manner.  Dedicated 
controllers also tend to be connected to "better" SCSI or SAS drives 
that properly report write completion via their command queuing protocol.

ZFS tackles this problem by not writing data back in place, with inline 
checksums of all data and metadata (so that corruption is detectable), 
and by dynamically-sized "full stripe writes" for every write (no 
read-update-write cycle required).

A solution for gvinum/UFS may be to set the stripe and filesystem block 
sizes the same, so that a partial stripe is never written and thus no 
read-update-write cycle occurs. However the use of in-place updates 
still has the possibility of corrupting data if the write completes on 
one drive in the array and not the other.

The visibility of this "RAID-5 hole" may be very low if you have a 
well-behaved system (and drives) on a UPS.  But since the corruption is 
silent, you can be stung far down the road if something "bad" does 
happen without notice.  Especially with ATA drives with less robust 
writeback cache behavior in small-system environments (without backup 
power, maybe-flaky cabling, etc...).

It is important to note that I am describing a universal problem with 
software RAID-5, and not any shortcoming of gvinum in particular.



Joe Koberg

joe at osoft dot us







Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?472AA59F.3020103>