From owner-freebsd-arch@FreeBSD.ORG Wed Aug 25 21:11:18 2010 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9593910656A8 for ; Wed, 25 Aug 2010 21:11:18 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (bsdimp.com [199.45.160.85]) by mx1.freebsd.org (Postfix) with ESMTP id 4068C8FC13 for ; Wed, 25 Aug 2010 21:11:18 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by harmony.bsdimp.com (8.14.3/8.14.1) with ESMTP id o7PL2eWY013887; Wed, 25 Aug 2010 15:02:40 -0600 (MDT) (envelope-from imp@bsdimp.com) Date: Wed, 25 Aug 2010 15:02:42 -0600 (MDT) Message-Id: <20100825.150242.450985660301753093.imp@bsdimp.com> To: xcllnt@mac.com From: "M. Warner Losh" In-Reply-To: <34EF2360-1B68-4E0C-8CCE-409CE141D0B8@mac.com> References: <34EF2360-1B68-4E0C-8CCE-409CE141D0B8@mac.com> X-Mailer: Mew version 6.3 on Emacs 22.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-arch@FreeBSD.org Subject: Re: RFC: root mount enhancement (round 2) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Aug 2010 21:11:18 -0000 In message: <34EF2360-1B68-4E0C-8CCE-409CE141D0B8@mac.com> Marcel Moolenaar writes: : 2. Negative experiences with the ramdisk root file system as a : general approach for mounting a root file system have been : expressed. To be fair, it was both positive and negative experiences. The negative experiences were from the server folks who hated when upgrading and the ram disk compiled into the kernel was out of date or incomplete. The positive experiences were from the embedded folks who used the RAM disk given to it by the boot loader so there was quite a bit more flexibility. This ram disk comes from a dedicated flash partition and is well supported by the different embedded boot loaders that are common in the embedded space (mostly because Linux requires it). There's even support for compression of the kernel and ram disk in the boot loader: it expands the kernel, the ram disk and then tells the kernel where to find the ram disk. : Let me mention a problem with the currently implemented root mount : logic as a reminder that something needs to be fixed, even if we : don't want to enhance: A USB disk cannot always be used as a root : file system by virtue of the USB stack releasing the root mount : lock after creating the umass device, but before CAM has created : the corresponding da device. The kernel will try mounting from : /dev/da0 before the device exists, fails and then drops into the : root mount prompt. Often the story ends here -- with failure. Actually, the problem isn't the locking at all. The problem is that the umass SIMs arrive 'late' in the game. by the time they arrive, CAM has already released the root lock. But as phk points out, this is a bug in the usb/cam interaction and should be fixed there and completely irrelevant for your root mounting system. : Round 2: : : The logic remains mostly the same as described in round 1, but : gains a directive and limited variable substitution. These are : added to decouple the mount directive (${FS}:${DEV}) from the : creation of the memory disk so that GEOM can do it's thing. As : such, the creation of a memory disk is now a separate directive: : : .md : : To mount the memory disk (UFS in the example), use: : : ufs:/dev/md# : : Here md# refers to the md unit created by the last .md directive. : Since the logic is for mounting the root file system only, a .md : directive implicitly detaches and releases the previously created : md device before creating a new one. In other words: the : enhancement is not for creating a bunch of md devices. : : Should this be relaxed so that any number of md device can be : created before we try a root mount? I guess I'm having trouble understanding why you'd need this given that ram disk information is already passed from the boot loader (/boot/loader or in the board's init code (although the latter I don't think is done by any in-tree code)) to the kernel... : When the md device appears, GEOM gets to taste the provider : and all kinds of interesting things can happen. By decoupling : the creating of the md device and the mount directive, it's : trivial to handle arbitrarily complex GEOM graphs. For example: : : ufs:/dev/md#s1a : ufs:/dev/md#.uzip : ... Shouldn't the MD device already be created by virtual of the MD_ROOT junk in the kernel config file? Why do you need a special directive to create it... : For completeness, the syntax of the configuration file (in : some weird hybrid regex-based specification that is sloppy : about spaces) to make sure things get fleshed out enough : for review: : : <.mount.conf> : (^$)* : : : | : | : : '#'.* : : : : : | : | : | : | : | : : ':' : : : | : : : | ',' : : : | '=' : | ".md" : : : | : : : | ',' : : "nocompress" # compress is default : | "nocluster" # cluster is default : | "async" : | "readonly" read-write compressed works? Also, is compression a property of the md device, or the GEOM that tastes it to see that it is compressed... What does cluster do anyway? I see that as an option for mdconfig, but there's no explanation of it there or in the md man page. How do you differentiate between these two roots: mdconfig -a -t file -f /gerbil.ram and mdconfig -a -t swap -s 4m dd if=/gerbil.rom of=/dev/md0 bs=1m with this scheme? I'm guessing only the former makes sense, although for upgrades, maybe you want the latter so you can replace /gerbil.rom at any time. But in that case, you're better off going through /boot/loader for this stuff, which leads me to my next question: Would any md device passed by the boot loader (or compiled into the kernel) would effectively be the second one and you'd not need any .md directives at all? : : ".ask" : : "wait" : : "onfail" : : "panic" # default : | "reboot" : | "retry" : | "continue" : : ".init" : : : | ':' : : : To re-iterate: the logic is recursive. After mounting some file system : as root, the kernel will follow the directives in /.mount.conf (if the : file exists) for remounting the root file system. At each iteration the : kernel will remount devfs under /dev and remount the current root file : system under /.mount within the new root file system. : : Thoughts? How is init handled at each stage? forked after the last one, I assume? Warner