Date: Mon, 23 Aug 2010 18:14:24 -0600 (MDT) From: "M. Warner Losh" <imp@bsdimp.com> To: xcllnt@mac.com Cc: freebsd-arch@freebsd.org Subject: Re: RFC: enhancing the root mount logic Message-ID: <20100823.181424.646155203640260173.imp@bsdimp.com> In-Reply-To: <8C76250B-E272-4807-BD0D-9F50D0BC5E10@mac.com> References: <AFBE2FCA-30A6-4E1D-A964-AC4DC4C843EB@juniper.net> <20100823.171201.107001114053031707.imp@bsdimp.com> <8C76250B-E272-4807-BD0D-9F50D0BC5E10@mac.com>
next in thread | previous in thread | raw e-mail | index | archive | help
In message: <8C76250B-E272-4807-BD0D-9F50D0BC5E10@mac.com> Marcel Moolenaar <xcllnt@mac.com> writes: : : On Aug 23, 2010, at 4:12 PM, M. Warner Losh wrote: : : *snip* : : > However, all this scripting sounds a bit like a very simple shell in : > the kernel. What advantages are there to this approach vs having the : > ability to run a simple shell script or executable and "pivot" the root : > to a new location? : : The 2 reasons for doing this in the kernel are: : 1. resiliency against ABI changes. : 2. allowing /sbin/init to come from the actual root file system. : : Both points are impossible to handle efficiently or correctly if : you need user space support in getting to your actual root file : system. You basically have a catch-22 or bootstrap problem, which : a pure in-kernel solution doesn't have. OK. That makes sense. Without execing the new init, which may be a problem with the current world view of init(8) and the kernel, you'd have to have your final init on the first level root file system. : > And how do you emulate the mount_foo programs for : > foo filesystems? Some of them do weird things that might not : > translate well into the kernel... : : True. I haven't flushed that out, but I was hoping that nmount(2) : would have normalized most of this that it's a non-issue, provided : we support mount options in this scheme. : : If you have a concrete example of something that's not so trivial, : but critical to support, let me know and I'll take it into account. mount_smbfs makes a connection to the remote system to do authentication presently in mount_smbfs and initializes the smb context before mounting the file system in the kernel. I don't know if I'd call this a critical to support feature, but it was the first "exception" to the rule that jumped into my head so I was curious if you'd thought about it. : > As you can see, I'm torn about how I feel about the idea. For simple : > cases, I think it is great, but as complexity builds, I become less : > sure. What if that iso image was compressed? : : Can you elaborate how this is potentially a problem in this scheme, : but not for "manual" mounting? You'd need a way to stack up different modules, since you'd need geom_uzip over md0 to make it useful to the cd9660 code. : > What if I had a : > software RAID of disks or flash devices? : : I see no problem. In fact, the idea is triggered by switching to a : flash file system on a NAND flash. RAID of Flashes. Something that would need configuration. but you may be correct: this level of flexibility may not be needed and other concerns may trump it... : > What about crypto? : : See above. Can you elaborate? Same thing, but with a crypto key :) : > I know I : > can handle those cases in /bin/sh, but will each new one require more : > code in the kernel? : : The way I see it is that the approach enhances how we now mount the : root file system. We have very limited flexibility. I do not claim : that my idea allows every possible variation, and I think it unfair : to expect that of the approach. If one has real complex requirements, : one can always just mount some file system on some storage device : and deal with the root mount in user space. I don't see how this : prevents that. init(8) is the show stopper to a pivot root approach, unless you could tell init that's on the first level and simple to exec /sbin/init to pickup the new copy, but I don't know how happy that would make the kernel.. : > What would df and/or mount tell you about the : > now-hidden file systems? : : Can you explain what you mean by now-hidden file systems? OK. Let's say we have a three level scheme: /dev/nor0 which has the initial root on it. Next up is foo.iso.gz which is mounted read only on md0 next up is geom_uzip which present the device as md0.uzip which gets mounted finally as root. So would df show: Filesystem 1024-blocks Used Avail Capacity Mounted on /dev/nor0 4096 4096 0 110% / /dev/md0.uzip 16000 16000 0 110% / or Filesystem 1024-blocks Used Avail Capacity Mounted on /dev/nor0 4096 4096 0 110% /.old_root /dev/md0.uzip 16000 16000 0 110% / and if we had one more layer on nand: Filesystem 1024-blocks Used Avail Capacity Mounted on /dev/nor0 4096 4096 0 110% / /dev/md0.uzip 16000 16000 0 110% / /dev/nand0 320000 300000 20000 82% / or Filesystem 1024-blocks Used Avail Capacity Mounted on /dev/nor0 4096 4096 0 110% /.old_root/.old_root /dev/md0.uzip 16000 16000 0 110% /.old_root /dev/nand0 320000 300000 20000 82% / is the question I'm asking... right now you can mostly do a pivot-root-like thing by having init do a chroot very early, possibly after executing a simple rc script to put the second level root system online. init_script gets run very early, followed by a chroot to init_chroot followed by a mount of devfs on /dev if necessary. However, when you do this, often times you end up with weird looking df output since / isn't really / to df. Anyway, the fact that we have a decoupled fork/exec really is what lead me to ask the question. It is useful to run arbitrary code between the two, even if you usually run the same code... sometimes you want to be different. I was thinking that this might be the same way here. But, as you rightly point out, maybe there's too much complexity in doing that and simpler is better. Warner
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100823.181424.646155203640260173.imp>