Date: Wed, 25 Aug 2010 16:36:37 -0600 (MDT) From: "M. Warner Losh" <imp@bsdimp.com> To: xcllnt@mac.com Cc: freebsd-arch@FreeBSD.ORG Subject: Re: RFC: root mount enhancement (round 2) Message-ID: <20100825.163637.1151864885495248514.imp@bsdimp.com> In-Reply-To: <AE9A0FB9-E338-447A-A788-C53E94600116@mac.com> References: <34EF2360-1B68-4E0C-8CCE-409CE141D0B8@mac.com> <20100825.150242.450985660301753093.imp@bsdimp.com> <AE9A0FB9-E338-447A-A788-C53E94600116@mac.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hey Marcel, The more I talk about this, the more that I think it might be useful in some ways. In message: <AE9A0FB9-E338-447A-A788-C53E94600116@mac.com> Marcel Moolenaar <xcllnt@mac.com> writes: : : On Aug 25, 2010, at 2:02 PM, M. Warner Losh wrote: : > : Let me mention a problem with the currently implemented root mount : > : logic as a reminder that something needs to be fixed, even if we : > : don't want to enhance: A USB disk cannot always be used as a root : > : file system by virtue of the USB stack releasing the root mount : > : lock after creating the umass device, but before CAM has created : > : the corresponding da device. The kernel will try mounting from : > : /dev/da0 before the device exists, fails and then drops into the : > : root mount prompt. Often the story ends here -- with failure. : > : > Actually, the problem isn't the locking at all. The problem is that : > the umass SIMs arrive 'late' in the game. by the time they arrive, : > CAM has already released the root lock. But as phk points out, this : > is a bug in the usb/cam interaction and should be fixed there and : > completely irrelevant for your root mounting system. : : I perceive the problem differently, because I see no value in waiting : for *all* devices to appear when the root device is already there. : That just slows down the boot. : : I prefer mounting the root file system as soon as the device appears : and enhance the fstab mounting to deal with the device not being : there yet. : : Consequently: the bug is with root_mount_hold() and root_mount_rel() : as a means to do the right thing... We don't need to enhance fstab to cope with / not being there. We need / to be there, one way or another. We may disagree on how best to make it be there. In the past I've swung the direction you talk about too. I've hacked mountroot() wait up to a given amount of time new devices to appear that contain the root file system before giving up. That way, if you know you've got the root file system, you can go right away, but otherwise you do something more intelligent than 'nothing' or 'prompt' when it isn't there. This meshes well with the .wait directive and your thinking too. The part I didn't like about this was the arbitrary upper time limit on it. I'd like to wait until *ALL* devices are done to fail and accept a '.wait 5' as an ugly alternative to knowing that all boot devices are there. I've also thought about having it drop to a prompt, but noticing that new devices show up. You could automatically proceed, or at the very least be able to type the new device in once it is there. This would let the normal boot proceed, kick you to the prompt if, say, the usb drive fell out and still let you plug it back in and have the system pick back up again. So, if your approach could have some hook for these types of enhancements (or used to implement them), that would be a compelling reason to support it. Of course, it would still require knowing when you are done with your initial scans of the device tree, which is at present an unsolved problem.... : > : Here md# refers to the md unit created by the last .md directive. : > : Since the logic is for mounting the root file system only, a .md : > : directive implicitly detaches and releases the previously created : > : md device before creating a new one. In other words: the : > : enhancement is not for creating a bunch of md devices. : > : : > : Should this be relaxed so that any number of md device can be : > : created before we try a root mount? : > : > I guess I'm having trouble understanding why you'd need this given : > that ram disk information is already passed from the boot loader : > (/boot/loader or in the board's init code (although the latter I don't : > think is done by any in-tree code)) to the kernel... : : You're fixating on the preloaded or compiled-in ramdisk. The : .md directive is there for vnode-backed images -- the root : file system image is stored on a file system and memory is : only used for buffering and caching. That makes sense. Not so much fixating on them, but noting that they work really really well and are the basis for many livecd's and such. They are the basis for all the picobsd derivatives as well. : > read-write compressed works? Also, is compression a property of the : > md device, or the GEOM that tastes it to see that it is compressed... : > What does cluster do anyway? I see that as an option for mdconfig, : > but there's no explanation of it there or in the md man page. : : The options are as useful as the md implementation is. The options : are listed because they appeared in mdconfig. Semantics is not to : be argued when syntax is discussed :-) fair enough... The compression bit was confusing. : > How do you differentiate between these two roots: : > : > mdconfig -a -t file -f /gerbil.ram : > and : > mdconfig -a -t swap -s 4m : > dd if=/gerbil.rom of=/dev/md0 bs=1m : : The first is supported, the second isn't. The .md directive only : supports vnode-backed md devices. There's no point trying to mount : a malloc- or swap-backed md device because they instantiate empty : and are useless for root file systems, unless you construct them : first (using dd is a way to construct them). Supporting the : construction of a root file system is where things get complicated : and where I personally don't want to go. Fair enough. It was mostly just a question for clarification that wound up rambling far too long. : > But in that case, you're better off going through : > /boot/loader for this stuff, which leads me to my next question: Would : > any md device passed by the boot loader (or compiled into the kernel) : > would effectively be the second one and you'd not need any .md : > directives at all? : : You can start off with a preloaded or compiled-in ramdisk, and then : recursively mount root, including from vnode-backed md devices, so : the .md directive is not rendered useless by preloading or compiling : in. You can even end the root mount recursion with the preloaded : ramdisk last -- this gives you premounted file systems under /.mount : without having to run /etc/rc (if you want to)... Is the .md directive globally destructive, or just destructive to the local level of recursion? If it is just the local level, how do you specify the unit number? Maybe a better approach would be to encourage people to mount root based on how file systems are labelled, rather than what unit they happen to be taking up... Would that help any here? : > : To re-iterate: the logic is recursive. After mounting some file system : > : as root, the kernel will follow the directives in /.mount.conf (if the : > : file exists) for remounting the root file system. At each iteration the : > : kernel will remount devfs under /dev and remount the current root file : > : system under /.mount within the new root file system. : > : : > : Thoughts? : > : > How is init handled at each stage? forked after the last one, I assume? : : No, init is only spawned after the root mount recursion ends. The .init : directive is there to override defaults. This is envisioned to be useful : for rescue images where you want to swawn /rescue/init or installation : images where you may want to spawn sysinstall. It eliminates having to : hardcode the possibilities in the kernel. Right now through the boot loader you can set init_path, why would you need to add the ability to spawn a different one to the scripts? : In a sense it gives you more freedom in how you want to call your initial : process without the pitfalls when the root mount recursion ends early due : to a problem. : : As a concrete example, consider having a single file system on a writable : medium (say /dev/da0) and software images are ISO images stored in it. : You can install some recovery procedure on /dev/da0 that gets run when : none of the ISO images can be mounted. The ISO images have /sbin/init : as init as usual, but you can select to run /sbin/recovery from /dev/da0. : This allows for a single init executable that performs the right functions : based on the program name for example... I think this is a bit convoluted an example. The ISO images would fail to mount only if they were all damaged in a way that would make them unmountable, true? If the backup ISO is AFU, then what's to say that /sbin/recovery isn't also AFU? When would you need this? Without a 'branch' construct of some kind, there's no way to match machine/platform names here. Given the limited ability for us to run kernels on multiple different platforms, I'm not sure how big a deal this actually would be, but if you can do this, it would be a nice plus. I presume the default script would be something like (ignoring the hard coding of device names): ufs:/dev/da0s1a .wait 5 .onfail ask which would mount /dev/da0s1a when it became available, waiting up to 5 seconds and asking the user afterwards if that failed, right? Warner
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100825.163637.1151864885495248514.imp>