From owner-freebsd-hackers  Fri Mar 31 23:05:18 1995
Return-Path: hackers-owner
Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id XAA03090 for hackers-outgoing; Fri, 31 Mar 1995 23:05:18 -0800
Received: from cs.weber.edu (cs.weber.edu [137.190.16.16]) by freefall.cdrom.com (8.6.10/8.6.6) with SMTP id XAA03078 for <freebsd-hackers@FreeBSD.org>; Fri, 31 Mar 1995 23:05:16 -0800
Received: by cs.weber.edu (4.1/SMI-4.1.1)
	id AA08921; Fri, 31 Mar 95 23:58:48 MST
From: terry@cs.weber.edu (Terry Lambert)
Message-Id: <9504010658.AA08921@cs.weber.edu>
Subject: Re: large filesystems/multiple disks
To: henrich@crh.cl.msu.edu (Charles Henrich)
Date: Fri, 31 Mar 95 23:58:47 MST
Cc: freebsd-hackers@FreeBSD.org
In-Reply-To: <199504010545.VAA01084@freefall.cdrom.com> from "Charles Henrich" at Apr 1, 95 00:45:28 am
X-Mailer: ELM [version 2.4dev PL52]
Sender: hackers-owner@FreeBSD.org
Precedence: bulk

> Are there any plans/work in progress for allowing FreeBSD the ability to have
> filesystems span multiple physicle volumes (ala Logical Volume Manager on
> AIX/OSF) ?

Both I and Phil Neiswanger have toyed with this idea.  At one time,
I had logical volume management working badly under 386BSD 0.1 patchkit 1
for ESDI drives.

The main issue is that the concept of partitioning/slicing/whatever
truly needs to be divorced from device nodes.

FreeBSD current is actually moving further away from this (or closer
to it using the logical but roundabout incremental improvement
approach depending on your point of view).  One thing that is
critical is that devices be where you left them, so dynamic disk
ID assignment is right out, at least at the kernel level (a logical
partition could be dynamically renamed relatively painlessly).

The main problem is that there needs to be file system support for the
idea of additional disk space ...ie: one place where you can add
things on.

This will work with IBM's JFS (obviously) or with a log structured
file system, but precious little else.  UFS is particularly badly
suited to doing this.  If you do what SPRITE does and shove all
the inodes in one area and all of the data blocks in another, you
can sort of do this for UFS.  The alternative is to preallocate
a major large number of inodes in the first place (which is what I
did) or to backup, remkfs the file system after adding the storage,
and restore everything.

It should also be noted that this type of arrangement is extremely
fragile -- it's order n^2 for n disks more fragile than file systems
not spanning disks at all.  You shouldn't attempt this type of
thing without being ready to do backups.  Basically, a failure of
one disk could theoretically take out all of you real file systems
sitting on logical partitions that spanned that one disk.  Pretty
gruesome, really.

The stuff I had relatively happy used ESDI drives and relied on a
working Bad144 mechanism, so it's kind of double-damned; if I were
to do it today, it'd be a complete rewrite.

One of the major pieces is the management piece to determine which
4M chunk on a physical disk is allocated to which logical disk
slice.

Needless to say, I binary edited this, so I didn't have on of these
written, so you'd have to write one of them as well.


The main gain is just-in-time meeting of storage requirements on
huge databases that grow incrementally slow.  The next most
popular use is to add swap space to a system by growing the logical
partition that's the swap area -- AIX is very swap hungry, being
even more obscenely radical about memory overcommit than most
systems.  With the ability to swap on files (which BSD has) this
is largely a useless application.  So while it is a cool feature,
it has limited practical utility.


					Terry Lambert
					terry@cs.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.