From owner-freebsd-arch@FreeBSD.ORG  Mon Aug 23 23:13:16 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0A2CE1065695
	for <freebsd-arch@FreeBSD.org>; Mon, 23 Aug 2010 23:13:16 +0000 (UTC)
	(envelope-from imp@bsdimp.com)
Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85])
	by mx1.freebsd.org (Postfix) with ESMTP id AA9BF8FC0C
	for <freebsd-arch@FreeBSD.org>; Mon, 23 Aug 2010 23:13:15 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
	by harmony.bsdimp.com (8.14.3/8.14.1) with ESMTP id o7NNBpNW057077;
	Mon, 23 Aug 2010 17:11:51 -0600 (MDT) (envelope-from imp@bsdimp.com)
Date: Mon, 23 Aug 2010 17:12:01 -0600 (MDT)
Message-Id: <20100823.171201.107001114053031707.imp@bsdimp.com>
To: marcelm@juniper.net
From: "M. Warner Losh" <imp@bsdimp.com>
In-Reply-To: <AFBE2FCA-30A6-4E1D-A964-AC4DC4C843EB@juniper.net>
References: <AFBE2FCA-30A6-4E1D-A964-AC4DC4C843EB@juniper.net>
X-Mailer: Mew version 6.3 on Emacs 22.3 / Mule 5.0 (SAKAKI)
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-arch@FreeBSD.org
Subject: Re: RFC: enhancing the root mount logic
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 23 Aug 2010 23:13:16 -0000

In message: <AFBE2FCA-30A6-4E1D-A964-AC4DC4C843EB@juniper.net>
            Marcel Moolenaar <marcelm@juniper.net> writes:
: All,
: 
: In embedded products, software is possibly installed as an image onto
: an actual storage device. This means that mounting the storage device
: as root is not enough to have a usable root file system. The rough
: draft below is an idea to enhance the root mount from having ad-hoc
: quirks to a well-defined and recursive mechanism to allow a wide-
: range of use cases.
: 
: The root mount logic is recursive as follows:
: 1.  The kernel mounts devfs as root (is it is now).
: 2.  The kernel will re-mount root by virtue of reading a file, called
:     /.mount.conf, in the current root file system and following the
:     directives is it. devfs synthesizes the contents of this file.
: 
: At each iteration, the kernel will:
: 1.  move the devfs mount from /dev in the old file system to /dev in
:     the new file system.
: 2.  As per the directives or unconditionally, the kernel will re-mount
:     the old root file system under /.mount (or some other name) within
:     the new file system.
: 
: devfs will synthesize the contents of /.mount.conf as per the kernel
: configuration and tunables. The administrator (or install process)
: will create and populate /.mount.conf for all other cases.
: 
: Directives in /.mount.conf are envisioned to be something like:
: 
:    {FS}:{MOUNTPOINT}	e.g.	ufs:/dev/da0
: 	a root mount alternative. The order of the alternatives in
: 	the file determines the priority.
: 
:    .ask
: 	a root mount alternative that asks the operator to specify
: 	what the root mount should be.
: 
:    .wait N			.e.g.	.wait 5
: 	wait at most N seconds for a root mount alternative to
: 	succeed. If an alternative does not succeed within that
: 	time, move on to the next alternative.
: 
:    .onfail	{panic|reboot|retry|continue}
: 	Tells the kernel what to do in case it can't successfully
: 	complete the root mount as directed to.
: 
: The .wait directive works better (probably) if we have events that
: signify the arrival of a file system or device special file, so that
: we can wait for at most N seconds after the last event. This also
: allows us to wait for a separate interval between events.
: 
: As an example, consider:
: 
:    [devfs]	/.mount.conf:
: 	ufs:/dev/da0
: 	.ask
: 	.wait 5
: 	.onfail panic
: 
:    [ufs:/dev/da0]	/.mount.conf
: 	md0:/images/OS-image-1.0.iso
: 	unionfs:/jail/freebsd-8-stable
: 	.wait 0
: 	.onfail continue
: 
: In the example, the kernel will mount devfs, read /.mount.conf and
: wait at most 5 seconds to mount the UFS on /dev/da0. If that fails,
: the kernel will ask (once) and panic in case of failure.
: 
: If the UFS root mount succeeded, the kernel will re-mount devfs
: underneath /dev. Since this is the first non-devfs root file system,
: the kernel will not re-mount the old root under /.mount.
: 
: Since there's a /.mount.conf on the UFS, the kernel will read it
: and repeat the process. First it'll try and mount the OS image
: in /images/OS-image-1.0.iso and if it's not present will try to
: mount some -stable 8 chroot using unionfs (not necessarily a
: real-world example here :-) If either fails, the kernel will
: continue booting using the current root file system. Assuming that
: the image is present, the kernel will re-mount root, move devfs
: underneath /dev in the MD root and remount ufs:/dev/da0 under
: /.mount in the MD root. This gives the following picture:
: 
: /		md0:[ufs:/dev/da0]/images/OS-image-1.0.iso
: /.mount		ufs:/dev/da0
: /dev		devfs
: 
: 
: Things to not explicitly touched upon:
: o   root mount options
: o   directives to instruct the kernel what to run as the initial
:     process to eliminate the rather ad-hoc hardcoding. E.g:
: 	.init /sbin/init
: 	.init /sbin/init.old
: 
: Is this something that people feel is worth fleshing out and
: prototyping?

This sounds very interesting.  If kept simple, I could see how this
would make my life a lot easier.

However, all this scripting sounds a bit like a very simple shell in
the kernel.  What advantages are there to this approach vs having the
ability to run a simple shell script or executable and "pivot" the root
to a new location?  And how do you emulate the mount_foo programs for
foo filesystems?  Some of them do weird things that might not
translate well into the kernel...

As you can see, I'm torn about how I feel about the idea.  For simple
cases, I think it is great, but as complexity builds, I become less
sure.  What if that iso image was compressed?  What if I had a
software RAID of disks or flash devices?  What about crypto?  I know I
can handle those cases in /bin/sh, but will each new one require more
code in the kernel?  What would df and/or mount tell you about the
now-hidden file systems?

Warner