Date: Sat, 1 Apr 95 15:19:27 MST From: terry@cs.weber.edu (Terry Lambert) To: PVinci@ix.netcom.com (Paul Vinciguerra) Cc: hackers@FreeBSD.org Subject: Re: large filesystems/multiple disks [RAID] Message-ID: <9504012219.AA11992@cs.weber.edu> In-Reply-To: <199504011440.GAA17939@ix3.ix.netcom.com> from "Paul Vinciguerra" at Apr 1, 95 06:40:04 am
next in thread | previous in thread | raw e-mail | index | archive | help
> >It should also be noted that this type of arrangement is extremely > >fragile -- it's order n^2 for n disks more fragile than file systems > >not spanning disks at all. You shouldn't attempt this type of > >thing without being ready to do backups. Basically, a failure of > >one disk could theoretically take out all of you real file systems > >sitting on logical partitions that spanned that one disk. Pretty > >gruesome, really. > > Why? Isn't this what the world is moving toward, ala RAID? It's my > understanding that RAID spanning/striping adds about 20% to the file > system, but IMPROVES reliability/stability. > > Most RAID systems I've investigated will let you pull a drive from the > array swap it with a new drive, and the system will rebuild the data ... > or so the claims go .. No one said anything at all about striping or optimistic replication here. All we were talking about was logical partitions that span multiple physical partitions/drives. It's fragile because you could for instance have four file systems with blocks in the same 16M area of a disk. A failure of a 16M are of a disk where the FS is mapped physically instead of logically could damage at most two file systems in the statistically unlikely event of the 16M are spanning a partition boundry. A failure of the same 16M area in a logical disk allocation environment could potentially samage 4 file systems (assuming a 4M allocation uniti identical to those used by AIX). Thus the damage of a typical failure (failures typically occur in a physically contiguous area of the disk) is multiplied. This isn't truly n^2; that's just a close approximation of the rate of growth of risk per disk added, not the probability itself. The actual probability is dependent on the average logical partion size divided by the allocation unit size relative to the number of allocation units in a particular set of storage. Also, this is more applicable to additional drives than initial drives, since the initial parittioning is likely to cause the allocation units in a particular logical partition to start life as contiguous regions (ith the exception of the designated overflow). When you start adding disks AFTER that, then you basically randomly allocate blocks. If we start to examine file systems that "know" about the orgin of logical allocation units (breaking the logical partitioning as an abstraction, somewhat), then the file system can employ strategies (which none of the BSD file systems implement) to ensure data integrity. RAID is an array redundancy mechanism, where the striping is a means of replication for data integrity (not necessarily optimistic redundant replication to speed access). The point of RAID is to have an array of n disks that store n' disks worth of data so that if a single failure occurs, no data is lost. The RAID approach is vastly more complex than simple volume spanning, and IBM JFS is not a soft implementation of RAID. It is true, however, that what you end up with is a logical n' number of disks as far as the machine is concerned. Actually, I love file system stuff; if you're really interested, I'd encourage you to look first at what you have at hand and have source for in FreeBSD; there are a number of papers under the name "Ficus" at ftp.cs.ucla.edu; these should probably be added to the documentation that is considered part of the information in 4.4BSD, notably John Heidemann's masters thesis. After that, I'd suggest looking at the IBM publications on JFS; they are mostly AIX manuals. I can get the IBM publication numbers from home if necessary, but any 3.1 manual set would help. There's also an IBM "Device driver writer's" manual which includes a disk with a sample GFS implementation (IBM's abstraction for an installable file system). This is supplementatry documentation, and must be ordered seperately. There's an internal USL publication called "SVR4 File System Writer's guide" that covers the VM and DNLC interfaces much more thoroughly than "The Magic Garden Explained" (a book I'd recommend, too). You can't get it from USL without a source license, but Dell did provide to to Dell UNIX owners in a slightly edited format at one time. There are a number of RAID papers at wuarchive.wustl.edu, and there is nearly every Usenix file system paper available online at the ftp.sage.usenix.org site. I think this is where I got the RAID II (RAID the second, not RAID level 2) papers. There are some documents that you won't find elsewhere at the uk document archive -- src.ic.ac.uk (I think). Oh, and I'd recommend looking at the "Choices" stuff out of the University Kentucky (can't remember the FTP address, sorry). Finally, there is a draft copy of SPEC 1170 and a number of very detailed papers on file systems at ftp.digibd.com; this is the majority of the ftp.uiunix.ui.org archives that I pieced together after UNIX International went under. Terry Lambert terry@cs.weber.edu --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9504012219.AA11992>