Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Jul 2000 12:00:39 -0700
From:      Julian Elischer <julian@elischer.org>
To:        Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc:        Robert Watson <rwatson@FreeBSD.ORG>, Warner Losh <imp@village.org>, Kelly Yancey <kbyanc@posi.net>, Dan Nelson <dnelson@emsphone.com>, Adrian Chadd <adrian@FreeBSD.ORG>, freebsd-arch@FreeBSD.ORG
Subject:   Re: DEVFS, the complete picture (Was: Re: SysctlFS)
Message-ID:  <397357D7.167EB0E7@elischer.org>
References:  <2365.963743367@critter.freebsd.dk>

next in thread | previous in thread | raw e-mail | index | archive | help
Poul-Henning has some well meaning ideas but I disagree
with him on several topics..

1/ jail/chroot cannot be ignored..
see my email on cdevs using a "60 byte" major number
for a solution to this..

2/ The root mount problem can be easily solved if you allow 
the kernel to open devices by name from the devfs namespace
without first mounting the device in user space.
I did this on the code He and sos deleted and it worked just fine.



Poul-Henning Kamp wrote:
> 
> OK, now I finally have time to sit down and write an email with the
> complete picture about devfs.
> 
> For a moment, disregard jails and rootmounts and let us just look at
> cloning.

they cannot be ignored as thay represent a significant usage model.
many programs expect a working /dev/tty for example.

> 
> Cloning means that a device driver doesn't have to call make_dev()
> on all potential instances up front.
> 
> This makes most difference for pseudo-devices, tun, ppp, slip, pty,
> md bpf and so on, but other "actual" drivers like fd could use it
> as well to avoid calling make_dev() for every conceiveable format
> of floppydisk.
> 
> Implementing cloning without devfs would be a gross hack: we would
> have to magically notice that /dev was searched and nothing found,
> and I think we might just as well forget everything about that idea.
> 
> Implementing cloning with devfs is simple:

I agree with what he says..

> 
>     Device-drivers can call devfs during their initialization and
>     register a "clone()" function with devfs.  (They obviously have
>     to deregister it again at dettach time).
> 
>     When devfs::VOP_LOOKUP() fails to find the name it is being told
>     to look for, it will call all registered clone() routines
>     successively with the sought after name as argument.
> 
>     Each driver clone routine examine the name, and if it can
>     instantiate a device of that name, it does so with make_dev()
>     and return EEXISTS.  If it cannot it returns 0.  If it
>     can determine for good that the name should not exist at this
>     time it returns ENOENT;
> 
>     If a clone routine return EEXISTS, devfs::VOP_LOOKUP()
>     immediately retries the lookup, and returns the result.
> 
>     If a clone routine returns ENOENT, devfs::VOP_LOOKUP() fails
>     with ENOENT;
> 
>     When a clone routine returns 0, devfs::VOP_LOOKUP() calls the
>     next clone routine in turn.
> 
>     If when all clone routines have been called none of them have
>     instantiated, devfs::VOP_LOOKUP() returns ENOENT;
> 
>     The dev_t's created this way at not special in any way, all normal
>     rules and rights apply.  The only thing special about this is
>     the "lazy creation" of dev_t's.
> 
> Next, let us look at the rootfs:
> 
> Today when we boot a FreeBSD system, various magic code finds and
> mounts a root filesystem from which we execute /sbin/init (and the
> rest becomes history).
> 
> A part of this ha^h^hmagic, is to take a device name, and come up
> with a vnode from which we can mount it, despite the fact that we
> have no filesystems mounted which can instantiate that vnode.
> Rather hackish, all in all.

devfs as it is now has routines to do this..

> 
> Other magic code will do similar gyrations to mount a NFS root
> filesystem.

Since you don't need a device to mount an NFS filesystem this
is not directly relevent.

> 
> This obviously is a chicken and egg issue, and there are probably
> no solution which is universally acceptable.  My personal preference
> is somewhat in the direction of what AIX have done, but with some
> slight modifications:
> 
>     Kernel initializes, probes devices and all that.
> 
>     Kernel mounts a devfs instance on /
> 
>     Kernel mounts a preloaded (or compiled in) md(4) instance
>     in /bootfs
> 
>     Kernel executes /bootfs/init
> 
>     /bootfs/init examines the environment to find the kind of desired
>     root filesystem.
> 
>         nfs: /bootfs/init will initialize a network interface (using
>              DHCP for instance) and union mount (not unionfs!) 
>              the root filesystem on /
> 
>         ufs: /bootfs/init will execute "/bootfs/fsck -p $device", and
>              afterwards unionmount (still not unionfs!) the 
>              device on /
> 
>         others: Whatever is needed.
> 
>     After mounting the desired root filesystem, /bootfs/init does an
>     execl("/sbin/init", "/sbin/init", 0); so that the "real" init(8)
>     is started as pid==1 as required.
> 
> I see many advantages to this scheme, the main thing is that a lot
> of h0h0magic code moves from the kernel into userland.
> 
> The /bootfs md(4) instance can be kept around, it will be very small,
> but it can also be unmounted and if our VM system is taught how to,
> the RAM can be recycled.
> 
> This scheme will also take all the pain out of things like raid-5
> rootfs:  No more kernel h0h0magic code needed, just add the vinum
> program to /bootfs and DTRT.

this seems overly complicated to me.

> 
> /bootfs/init could conveniently be a shell script btw.
> 
> Finally, jails:
> 
> The only reason there could ever be to mount a devfs in a jail
> partition is to get access to the cloning facility, mainly for
> ptys.  For the /dev/null, /dev/zero etc cases, a good oldfashioned
> mknod(8) will do just fine.  Remember: the main reason for devfs
> is to cater for dynamic devices, the main thing we don't want to
> see pop up in jails is dynamic devices.

If we us a "SYMLINK-LIKE" major number replacement,
that links an on-disk device inode to the current devfs
canonical namespace, then no extra devfs's need to be mounted,
and the cloning facilities in devfs will be  available
to any inode that links to a cloning device.


This implies that the BASE SYSTEM is taught how to handle
"string"  type major numbers and look them up in 
the devfs by name, or search through the dev_t nodes
in a non_devfs kernel to find the appropriate driver and
cookie (minor number).

	By extension /dev could remain an on-disk
item and have all links to /devfs which would solve some 
of the screams for
persistance because the permissions and ownerships would be
taken from teh cdev nodes on the disk.

If you went to /devfs you would get the default version of 
permissions and such


Basically I'm suggesting SUPLEMENTING devfs with
an alternate method of reaching the devices, which uses the 
devfs canonical namespace as a linking mechanism.

> 
> So the devfs vs jail issue almost entirely boils down to "what do
> we do about ptys in jails" and considering that it actually works
> now in "the good old way", I frankly can't see much reason to
> not just continue that way.  Few jails are pty intensive anyway.
> 
> Summary:
> 
> 1. Forget about jails in the context of devfs, we don't need it.
> 
> 2. We can argue if we should unionmount the "real root" over a
>    devfs, or if we should mount devfs on /dev.  Both arguments
>    have some amount of merit: The former is cleaner, the latter
>    is more like it used to be.
> 
> 3. Cloning while not strictly a must, is highly desireable.
> 
> 

-- 
      __--_|\  Julian Elischer
     /       \ julian@elischer.org
    (   OZ    ) World tour 2000
     ;_.---._/  presently in:  Budapest
            v


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?397357D7.167EB0E7>