From owner-freebsd-fs  Tue Apr 30 17:58:40 2002
Delivered-To: freebsd-fs@freebsd.org
Received: from harrier.prod.itd.earthlink.net (harrier.mail.pas.earthlink.net [207.217.120.12])
	by hub.freebsd.org (Postfix) with ESMTP id 9335037B417
	for <freebsd-fs@freebsd.org>; Tue, 30 Apr 2002 17:58:34 -0700 (PDT)
Received: from pool0580.cvx40-bradley.dialup.earthlink.net ([216.244.44.70] helo=mindspring.com)
	by harrier.prod.itd.earthlink.net with esmtp (Exim 3.33 #2)
	id 172iRz-0005Aq-00; Tue, 30 Apr 2002 17:58:32 -0700
Message-ID: <3CCF3D98.3495D84D@mindspring.com>
Date: Tue, 30 Apr 2002 17:58:00 -0700
From: Terry Lambert <tlambert2@mindspring.com>
X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony}  (Win98; U)
X-Accept-Language: en
MIME-Version: 1.0
To: utsl@quic.net
Cc: "Andrew P. Lentvorski" <bsder@allcaps.org>,
	freebsd-fs@freebsd.org
Subject: Re: Non-standard root filesystems
References: <20020429153020.Q16532-100000@mail.allcaps.org> <3CCEC7D5.D22356A0@mindspring.com> <20020430204153.GB3603@quic.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-fs.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-fs>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-fs>
X-Loop: FreeBSD.org

utsl@quic.net wrote:
> On Tue, Apr 30, 2002 at 09:35:33AM -0700, Terry Lambert wrote:
> > FreeBSD treats root mounts as "special", relative to all other
> > mounts.  This is a design error, but overcoming it requires a
> > reorganization of the mount code that's not really politically
> > easy to accomplish, even though it's technically very easy.
> >
> > Some of the stuff Poul is doing right now will probably help
> > you in the future with assembing things like RAID-able
> > volumes in the future -- but not help you right now.
> 
> Linux has a syscall (pivot_root) to swap the root with another mounted
> filesystem. It is occasionally quite useful, and I've been wondering
> about implementing it (or something similar) on FreeBSD.
> 
> Possibly you can tell me why that wouldn't work, or would be a bad
> idea.

Doing that would be very hard.  The way mount points work
won't exactly make it impossible, but it won't make it easy.

Here's the architectural fix:

1)	Seperate the mount point covering code from the per FS
	mounting code.

2)	Add a seperate VOP for setting the "mounted on" information
	into the superblock (some FS's, like FFS, like to record
	the "last mounted on" information; this is actually not
	used for anything that I've ever seen (right now), so it
	would probably be OK to rip out completely (right now; it
	could later be useful for automounting and getting rid of
	/etc/fstab entirely).

3)	When mounting an FS at the VFS_MOUNT layer, simply get a
	pointer into the list of mounted file systems.  *DO NOT*
	deal with the mount point covering at all in the per FS
	code!

4)	Deal with the mount point covering in the higher level
	code; this reduces the amount of crap you have to
	parse in a per FS manner anyway.  The covering is done
	by referencing the FS in the system mounted FS layer
	from #3 (above).

At this point, from the VFS perspective, all mounts -- root and
non-root -- are exactly the same: you implement the one type of
mount (the "fill in this mount table entry and set up the in core
mount structure data" kind), and it's taken care of... the only
difference between a root and a non-root mount is the vnode
covering code for the mount, and that all uses the same code at
a higher layer.

This would also make your "pivot" FS work correctly... to do that,
you would have to cover an opaque vnode.  You could actually do
this with any vnode, by revoking the vnode, and making it a deadfs
vnode.


> > As far as software RAID is concerned: it's a bad idea, from a
> > performance perspective; I don't recommend it.  Note that I'm
> > the person who did the original user space RAIDframe port to
> > FreeBSD in the mid 1990's, so I'm not just talking out my butt:
> > the amount of overhead for parity calculation and storage is
> > *considerable*, and makes RAID hardware a *much* better idea.
> 
> I agree with you about the performance. Hardware RAID is faster, more
> reliable, uses less resources, etc. However, many people don't have the
> budget for it.

I guess they don't get RAID.

8-) 8-) 8-) 8-).


> In my case, I have production systems running Linux with software RAID.
> I would much rather run hardware RAID and FreeBSD, but I have no budget
> to buy SCSI RAID controllers. Switching to FreeBSD+Vinum would be a
> reasonable solution, but I can't mirror root, and that creates a
> political problem. I get, "If FreeBSD and Vinum will be better, how come
> you can't mirror the root filesystem?"

How does mirroring the root FS recover after an error?  If you
can't load the kernel to load the software RAID, then you can't
run the software RAID to recover from a failure, right?

How does Linux solve this problem?  *Does* Linux solve this
problem, or are we really talking about an unrecoverable
condition that Linux lets you get yourself into, but FreeBSD
doesn't?

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message