Date: Tue, 1 Feb 2000 19:04:41 +1030 From: Greg Lehey <grog@lemis.com> To: "Justin T. Gibbs" <gibbs@narnia.plutotech.com>, Gary Palmer <gjp@in-addr.com> Cc: scsi@FreeBSD.org, up@3.am, Wilko Bulte <wilko@yedi.iaf.nl> Subject: Re: hardware vs software stripping Message-ID: <20000201190440.Q76348@freebie.lemis.com> In-Reply-To: <200001311432.HAA32638@narnia.plutotech.com> References: <up@3.am> <87942.949373872@in-addr.com> <Pine.BSF.4.10.10001301401360.60037-100000@server.b0x.com> <200001311432.HAA32638@narnia.plutotech.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Monday, 31 January 2000 at 7:32:31 -0700, Justin T. Gibbs wrote: > In article <20000131104827.A62824@freebie.lemis.com> you wrote: >> >> I suppose you mean striping. RAID-5 doesn't stripe at the byte level, >> it stripes at the block level. RAID-3 stripes at the byte level. > > I've heard you say this several times, but it is simply not true. It's not simply true, anyway :-) I think one of the problems is that I can't find an authoritative definition of the levels. I was going to buy one of those super-expensive books that you probably have, but in the meantime I've been limited to various web pages. At http://www.fdma.com/info/raidinto.html (now dead), I was told that RAID-3 stripes at the byte level, and RAID-4 stripes at a block level. At http://www.lib.ox.ac.uk/internet/news/faq/archive/arch-storage.part1.html, I read: Raid Level 3 - Data protection disk - mathematical ECC type code calculated from multiple spindles and stored on another spindle. Raid Level 4??? similar to 3, with block striping instead of byte. Raid Level 5 - Striping plus data protection - stripe data across multiple spindles (as in RAID Level 0) and have data protection calculations (as in RAID level 3) but don't put all the calculated figures onto one spindle, but spread it out. That appears to be less than authoritative. At http://www.adaptec.com/technology/whitepapers/raid.html, I read: RAID Level 3 stripes data at a byte level across several drives, with parity stored on one drive. It is otherwise similar to level 4. Byte-level striping requires hardware support for efficient use. RAID Level 4 stripes data at a block level across several drives, with parity stored on one drive. The parity information allows recovery from the failure of any single drive. The performance of a level 4 array is very good for reads (the same as level 0). Writes, however, require that parity data be updated each time. This slows small random writes, in particular, though large writes or sequential writes are fairly fast. Because only one drive in the array stores redundant data, the cost per megabyte of a level 4 array can be fairly low. RAID Level 5 is similar to level 4, but distributes parity among the drives. This can speed small writes in multiprocessing systems, since the parity disk does not become a bottleneck. Because parity data must be skipped on each drive during reads, however, the performance for reads tends to be considerably lower than a level 4 array. The cost per megabyte is the same as for level 4. Later in this page, I read: RAID Level Uses Level 0 (striping) Any application which requires very high speed storage, but does not need redundancy. Photoshop temporary files are a good example. Level 1 (mirroring) Applications which require redundancy with fast random writes; entry-level systems where only two drives are available. Small file servers are an example. Level 4 (parity) Applications which require redundancy at low cost, or with high-speed reads. This is good for archival storage. Larger file servers are an example. Level 5 (distributed parity) Similar to level 4, but may provide higher performance if most I/O is random and in small chunks. Database servers are an example. Note that they don't mention RAID-2 or RAID-3. I'd agree with all this except for RAID-4: there's no real advantage to RAID-4 over RAID-5. At http://www.baydel.com/tutorial.html I read: RAID Levels The 1988 RAID paper proposed 5 levels: 1: mirroring. 3: byte striping with dedicated parity. 4: block striping with dedicated parity. 5: block striping with distributed parity. (RAID2 was superseded by RAID3) RAID3 was considered to be ideally suited to large 'scientific' transfers and RAID5 to OLTP, or Transaction Processing. Inexplicably, the researchers gave a strong implication that RAID3 write performance would be bottlenecked on the parity drive. In fact, RAID3 'parallel' write performance is far better than with RAID5 or 'independent' RAIDs. Also, over the years, OLTP applications have been exhibiting an increasing write load with a small I/O size, resulting in a negation of the benefit of RAID5. Other applications such as NFS fileserving, Novell, Multi-media etc have I/O granularity above a size ideally suited to RAID5. This is an interesting viewpoint. In many cases, it's true, if you always transfer a complete number of blocks, since then the pre-reads of RAID-[45] aren't needed, which nearly doubles the write performance. Most of the other URLs I had have died, but they said much the same sort of thing. Finally, at http://www.whatis.com/raid.htm I read: RAID-3. This type uses striping and dedicates one drive to storing parity information. The embedded error checking (ECC) information is used to detect errors. Data recovery is accomplished by calculating the exclusive OR (XOR) of the information recorded on the other drives. Since an I/O operation addresses all drives at the same time, RAID-3 cannot overlap I/O. For this reason, RAID-3 is best for single-user systems with long record applications. RAID-4. This type uses large stripes, which means you can read records from any single drive. This allows you to take advantage of overlapped I/O for read operations. Since all write operations have to update the parity drive, no I/O overlapping is possible. RAID-4 offers no advantage over RAID-5. RAID-5. This type includes a rotating parity array, thus addressing the write limitation in RAID-4. Thus, all read and write operations can be overlapped. RAID-5 stores parity information but not redundant data (but parity information can be used to reconstruct data). RAID-5 requires at least three and usually five disks for the array. It's best for multi-user systems in which performance is not critical or which do few write operations. This comes closest to your definition by not using the term 'byte' in describing RAID-3, but it doesn't deny the possibility either. In general, it's a bit vague. Theoretically, the RAID-3 unit could be sectors, but in my view that would make it a special case of RAID-4. This page is also inaccurate in its description of RAID-4 and RAID-5: RAID-4 *can* overlap read operations, and RAID-5 can't always overlap write operations. In fact, there's very little difference in the amount of mutual exclusion needed on writes. > RAID-3 is the same as RAID4 without the optimization for partial > stripe writes. In otherwords, in RAID-3, you must read or write a > full stripe where RAID-4 adds the ability to perform RMW operations > on the parity block of the stripe for sub-stripe updates. I'm not sure I follow you here. Are you saying that the data layout is the same and the difference is in the implementation of the software? That doesn't seem to justify a separate level. > Pluto uses a RAID-3 system in its video server products and it is > certainly not striped on a byte level. So how exactly is it striped? > (Just as an aside, given the minimum 512 byte sector size of most > magnetic media, striping an a per byte basis would be really > wasteful). Agreed, unless you use a PLA to split the data. Obviously, the manufacturer of your RAID-3 box uses the term differently from the way it's defined above. There's obviously some confusion, but I don't know who is right, but I would have thought Adaptec knew what they're talking about (especially when they point out the need for hardware support). On Monday, 31 January 2000 at 21:57:52 -0500, Gary Palmer wrote: > up@3.am wrote in message ID > <Pine.BSF.4.10.10001311627210.14233-100000@richard2.pil.net>: >> IIRC, the main difference between 3 and 5 is that 3 puts all of the parity >> blocks on one spindle, whereas 5 distributes them across all of the >> spindles. > > You're confusing RAID3 with RAID4. RAID4 is RAID 0 with parity (on > one spindle) and RAID 5 is RAID 0 with striped parity. I'd call RAID-5 rotated parity, not striped. The way I see it, RAID-[3-5] are all striped. Before you reply to these messages telling me where I'm wrong, please check out http://www.lemis.com/vinum/implementation.html and tell me where you disagree with what I say there. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000201190440.Q76348>