Date: Tue, 01 Oct 1996 09:11:18 -0700 From: Jason Thorpe <thorpej@nas.nasa.gov> To: Poul-Henning Kamp <phk@critter.tfs.com> Cc: James Graham <greywolf@siva.captech.com>, "Kevin P. Neal" <kpneal@pobox.com>, hackers@freebsd.org, tech-kern@netbsd.org Subject: Re: VPS mailing list, BSD interest? Message-ID: <199610011611.JAA00870@lestat.nas.nasa.gov>
next in thread | raw e-mail | index | archive | help
[ Keeping in mind, I haven't been thinking about mass storage for some time, and was hoping to keep my brain out of that mode, but whatever :-) ] On Tue, 01 Oct 1996 09:18:47 +0200 Poul-Henning Kamp <phk@critter.tfs.com> wrote: > The problem I'm referring to is that this should not be done in a > pseudo-driver, but as a general framework for bdevs. > > For instance, why can't I have my root-partition striped ? I think a better question is "why would I _want_ my root partition striped?" :-) (The real answer to your question is "Becuse then you've added unnecessary clutter to the ccd configuration code to deal with both statically- and synamically-configured ccds". In my mind, saying that your MUST WORK AT ALL COSTS fileystem isn't allowed to be striped is an acceptable trade-off :-) > There is no significant difference between the FDISK, bsd-disklabel, > mirror, stripeing and raid 5 operations. They all translate a > (dev+blkno+len) tupple to one or more similar tupples. ...True, but the way they're translated makes a world of difference. In the case of mirroring, you're translating a <dev+blk+len> into multiple <dev+blk+len> for writes, and for reads, you want to find the least-busy component, attempt the read, and then retry with another component if that read fails (indeed, you want to continue trying until you're out of living components). The vast majority of the code in the ccd is dealing with configuration (looking up the components, constructing the interleave table, etc.) The actual translation code is small ... The same is true of the mirror driver I started (but never finished). It was mostly configuration, though the translation code was a bit more complicated due to "mirroring on writes, read from least busy with error recovery" semantics. The mirror driver also, by design, doesn't support disklabels (doesn't make any sense, really; perhaps I just want to mirror a single partition). In short, the semantics of "tupple translation" are vastly different from ccd, and on planet 9 compared to regular partition translation. Smashing the configuration of those two drivers (I'd actually rather call them `layers') together would be silly, because they have different configuration needs. In my little world, the right way to get mirroring + striping is to either: - make several 2- or 3- (or N-)way `mirror disks' and use those mirror disks as components for a ccd. - make 2, 3 (or N) identical ccds, and use those as the components of a `mirror disk'. ...depending on the behavior you want (probably the former). It's not clear there's any real architecural benefit from creating a generic framework for doing this sort of tranlation. In fact, I see at least one very negative outcome: you slow down and bloat up the simple case of partition translation (which, as it stand now, is very fast, and very simple). Having worked with IRIX's logical volume stuff, the principle of KISS was high on my list when doing the ccd work :-) Jason R. Thorpe thorpej@nas.nasa.gov NASA Ames Research Center Home: 408.866.1912 NAS: M/S 258-6 Work: 415.604.0935 Moffett Field, CA 94035 Pager: 415.428.6939
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199610011611.JAA00870>