From owner-freebsd-arch@FreeBSD.ORG  Tue Aug 24 04:33:45 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E2F0B1065670
	for <freebsd-arch@FreeBSD.org>; Tue, 24 Aug 2010 04:33:45 +0000 (UTC)
	(envelope-from bakul@bitblocks.com)
Received: from mail.bitblocks.com (mail.bitblocks.com [64.142.15.60])
	by mx1.freebsd.org (Postfix) with ESMTP id C03378FC0A
	for <freebsd-arch@FreeBSD.org>; Tue, 24 Aug 2010 04:33:45 +0000 (UTC)
Received: from bitblocks.com (localhost.bitblocks.com [127.0.0.1])
	by mail.bitblocks.com (Postfix) with ESMTP id CA4DE5B56;
	Mon, 23 Aug 2010 21:33:44 -0700 (PDT)
To: Marcel Moolenaar <xcllnt@mac.com>
In-reply-to: Your message of "Mon, 23 Aug 2010 18:24:07 PDT."
	<4CB9F7C8-39E8-4C3B-A3F8-A5A9EC178E7D@mac.com> 
References: <AFBE2FCA-30A6-4E1D-A964-AC4DC4C843EB@juniper.net>
	<20100823.171201.107001114053031707.imp@bsdimp.com>
	<8C76250B-E272-4807-BD0D-9F50D0BC5E10@mac.com>
	<20100824002350.042A45B3B@mail.bitblocks.com>
	<4CB9F7C8-39E8-4C3B-A3F8-A5A9EC178E7D@mac.com>
Comments: In-reply-to Marcel Moolenaar <xcllnt@mac.com>
	message dated "Mon, 23 Aug 2010 18:24:07 -0700."
Date: Mon, 23 Aug 2010 21:33:44 -0700
From: Bakul Shah <bakul@bitblocks.com>
Message-Id: <20100824043344.CA4DE5B56@mail.bitblocks.com>
Cc: "freebsd-arch@FreeBSD.org" <freebsd-arch@FreeBSD.org>
Subject: Re: RFC: enhancing the root mount logic 
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Aug 2010 04:33:46 -0000

On Mon, 23 Aug 2010 18:24:07 PDT Marcel Moolenaar <xcllnt@mac.com>  wrote:
> 
> On Aug 23, 2010, at 5:23 PM, Bakul Shah wrote:
> 
> >> The 2 reasons for doing this in the kernel are:
> >> 1.  resiliency against ABI changes.
> >> 2.  allowing /sbin/init to come from the actual root file system.
> >> 
> >> Both points are impossible to handle efficiently or correctly if
> >> you need user space support in getting to your actual root file
> >> system. You basically have a catch-22 or bootstrap problem, which
> >> a pure in-kernel solution doesn't have.
> > 
> > How about just bundling a small compressed ramfs with the
> > kernel.  The kernel unpacks it, uses it as the initial rootfs
> > and runs init from it. A forth/scheme/lua based program
> > wouldn't add more than a % or so (given that the GENERIC
> > kernel is over 10MB now!).

BTW, a friend tells me this is what Linux does (or more
likely, what they used in their server startup). Basically a
ramdisk with init + loadable drivers + tools needed to get
going.  Once the actual root fs device is found (even if
disks got switched around etc.) they switched to the actual
root.

> Not impossible, but it isn't exactly simpler from what I'm looking
> for:
> 1.  The /sbin/init being run is not the one on the actual (final)
>     root file system. Getting that one to run requires a special
>     init on the ramdisk.

Yes. But then you just exec() the real init once you have
"pivoted" to the final root fs. You run with ramfs only as
long as you have to.

> 2.  The R/O image needs the underlying file system mounted some-
>     where so that there's persistent storage to write. Setting
>     all of this up in user space is impossible if the underlying
>     file system(s) needs to be unmounted/unmountable.
>
> 3.  Upgrades and downgrades are tricky to handle when the root
>     F/S is the ramdisk, after which some user space environment
>     has to find the storage media and then mount it using mount
>     options it has no easy way to obtain.

Would that still be a problem once you switch to the final root?

> It appears that this solution, while in user space, requires more
> code and special handling than a "simple" recursive algorithm for
> something the kernel has to do anyway. I may be mistaken though...

It may start out "simple"....