Date: Mon, 01 Nov 2004 14:01:40 +0100 From: Martin Nilsson <martin@gneto.com> To: Brad Knowles <brad@stop.mail-abuse.org> Cc: current@freebsd.org Subject: Re: Gvinum RAID5 performance Message-ID: <418633B4.80004@gneto.com> In-Reply-To: <p06002006bdabc1160a6a@[10.0.1.3]> References: <002401c4bf9c$c4fee8e0$0201000a@riker> <p06002002bdab24905ad8@[10.0.1.3]> <1099286568.4185c82881654@picard.newmillennium.net.au> <p06002006bdabc1160a6a@[10.0.1.3]>
next in thread | previous in thread | raw e-mail | index | archive | help
You guys seem to confuse byte level striping with block level striping and the use of the parity disk. Adaptec have a nice whitepaper that explains this here: http://graphics.adaptec.com/pdfs/ACSP_RAID_Ch4.pdf Brad Knowles wrote: > At 4:22 PM +1100 2004-11-01, Alastair D'Silva wrote: >> offshoot >> of this is that to ensure data integrity, a background process is run >> periodically to verify the parity. That process (LSI Logic calls it patrol read) is more to exercise the disks to spot seldom used marginal blocks in time, just like diskcheckd in ports. > Keep in mind that if you've got a five disk RAID-5 array, then for > any given block, four of those disks are data and would have to be > accessed on every read operation anyway, and only one disk would be > parity. No RAID5 is block striped, so for any read operation you only have to read the block(s) where the data is stored. Use as large blocks as possible to avoid accessing more than one dive per transaction. > Even if you could get away from reading from all disks in the stripe > (and delaying the parity calculations to a background process), you're > not going to get away from writing to all disks in the stripe, because > those parity bits have to be written at the same time as the data and > you cannot afford a lazy evaluation here. When writing you read the data+parity block, do some XOR magic and then write out the two blocks again. This is why RAID5 is so painfully slow with writes as it have to do four disk transactions for every single write transaction. A large battery backed writeback cache can help with this, both to order the accesses better and to delay write bursts until later when the disks are not accessed so much. The parity is used to reconstruct a failed drive, not to check the integrity of data on the drives when reading. Drives have very good error detection when reading data, if data is returned upon a read operation it can be assumed to be correct. If the read fails the RAID system should mark the drive as down/failed, and treat the array as degraded i.e. use the parity on reads. /Martin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?418633B4.80004>
