From owner-freebsd-arch  Fri Jan  1 14:03:11 1999
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id OAA00531
          for freebsd-arch-outgoing; Fri, 1 Jan 1999 14:03:11 -0800 (PST)
          (envelope-from owner-freebsd-arch@FreeBSD.ORG)
Received: from ns1.yes.no (ns1.yes.no [195.204.136.10])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id OAA00523
          for <freebsd-arch@freebsd.org>; Fri, 1 Jan 1999 14:03:08 -0800 (PST)
          (envelope-from eivind@bitbox.follo.net)
Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218])
	by ns1.yes.no (8.9.1a/8.9.1) with ESMTP id XAA00855
	for <freebsd-arch@freebsd.org>; Fri, 1 Jan 1999 23:02:44 +0100 (CET)
Received: (from eivind@localhost)
	by bitbox.follo.net (8.8.8/8.8.6) id XAA88406
	for freebsd-arch@freebsd.org; Fri, 1 Jan 1999 23:02:44 +0100 (MET)
Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.40.131])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id GAA03366
          for <arch@freebsd.org>; Thu, 31 Dec 1998 06:08:12 -0800 (PST)
          (envelope-from phk@critter.freebsd.dk)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.9.1/8.8.5) with ESMTP id PAA94810
	for <arch@freebsd.org>; Thu, 31 Dec 1998 15:07:17 +0100 (CET)
To: arch@FreeBSD.ORG
Subject: DEVFS, the time has come...
From: Poul-Henning Kamp <phk@FreeBSD.ORG>
Date: Thu, 31 Dec 1998 15:07:17 +0100
Message-ID: <94808.915113237@critter.freebsd.dk>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


... to make up our mind about it.

There are a number of options open for us, and to make sure we talk
about the substance let me stress once and for all that this dicussion
should be about the >concept< of a devfs, not the code currently in 
the tree.

Once concensus has been reached about what and how a DEVFS should be
and react, we can start to study the code in the tree to see if it
fits or what it will take to get a devfs that does.


DEVFS, HERITAGE, CONCEPT and ADVANTAGES:

I talked to Dennis Ritchie about the history of device nodes
(CHR/BLK) when I ate breakfast with him in New Orleans.  Orginally
it was very crude but simple: Inode number 7 was the printer, 8
was the disk, and so on.  The root-dir was around 40 at the time.
This soon became a problem, so the device nodes were introduced,
and we've had them ever since.  Device nodes are a shortcut from
one namespace (filesystem) to another (cdevsw/bdevsw arrays of
device drivers).  There is according to Dennis Ritchie no compelling
reason to have a separate namespace for anything, if you could put
it in the filesystem (plan#9 anybody ?)

So what a devfs does is to remove the cdevsw and bdevsw namespaces
of device drivers and instead attach not device drivers but 
instances of devices into the filesystem directly.

The advantages of having a DEVFS is trying to solve are the following:

1. Static mapping of major devices must be maintained, this means
   that 3rd party drivers need to be catalogued and assigned numbers
   for, and major numbers are a limited resource: there are only
   256 of them.

2. Dynamic creation of the needed device nodes, instead of magic
   shell scripts (MAKEDEV), such that the found devices are available
   in /dev, and if the device is not there on the next boot, the
   nodes in /dev are gone again.  This is a major tickbox item for
   people working on Plug&Play, Cardbus, PCMCIA and other dynamic
   configuration technologies.

3. Avoid the NFOOBAR definitions in the drivers.  A Devfs would
   allow the device driver to attach sufficient information to the
   vnode that it can find both its "softc" structure and a unit
   number from the vnode.  Currently only the minor device number
   is available, and that only allows the unit number to be found,
   the softc struct must be found by taking (a subset of) the minor
   number as an array index.  DEVFS will also remove the need to
   check the validity of the minor number in the drivers.

4. Clone devices.  Rather than define 64 ptys in the system, the 
   system should make a new ones on demand.  This is hard, if not
   downright impossible, to do if you have to mknod something to
   be allowed to use the thing.

There is no doubt that the sources will be cleaner and have less
implicit cross dependencies with a well implemented DEVFS; for
instance the code in UFS which special cases device nodes can
be removed.

There are some issues relating to devices and chroot jails, but they
are well understood and no major trouble to implement, and I hope
we can just ignore them in this discussion for now, as they are a
subset of the general problems, and present no new or unique aspects
of these problems.

I don't currently know of anybody disagreeing in any of the above,
but feel free to raise arguments against it if you have any.


The PROBLEM:

The sticky issue about DEVFS, at least in FreeBSD, is called
"persistence".

NON-PERSISTENT DEVFS:
	A "non-persistent" devfs boots up with all found devices
	visible, (probably mode 0600 root.wheel, 0700 for directories)
	and a script in /etc/dev.rc will contain the policy for
	the devices:

		chmod 660 cua*      ; chown uucp.dialer cua*
		chmod 600 tty[dil]* ; chown root.wheel tty[dil]*
		chmod 640 fd[0-9]*  ; chown root.operator fd[0-9]*
		...

	If you remove a device, it's gone.  No way to get it back
	short of a reboot, and it will be back after the next reboot
	if the driver is there and finds its hardware, so to remove
	a device as a policy, you would need to put the rm command
	in /etc/dev.rc.  If you "whiteout" a device, you can get
	it back with "undelete" and don't need a reboot for it.
	You can create symbolic and hard links and directories
	in the DEVFS, but they will be gone on the next reboot, so
	if you want them around all the time, put them /etc/dev.rc
	There are no need for any special userland tools.

	I belive this completely decribes all aspects of a
	non-persistent DEVFS.


PERSISTENT DEVFS:

	A "persistent" devfs will use some kind of stable storage
	to track the devices with.  First time a device is found,
	the driver will suggest what mode, owner and group to set
	on the device.  If root goes into the filesystem and does

		chmod 600 cua*

	the currently found cua* devices will get this new mode,
	and will have that mode also after a reboot.  Including
	the case where for half a year the hardware is not there
	and then suddenly some hardware comes back that fits the
	bill.
	Once recorded in the persistence database, there is no way
	to say "restore defaults mode/owner/group" short of setting
	it manually.  If a new cua device appears, it will still
	come up with the default driver based permissions, the
	wildcard aspect of the above command is not recorded, so
	it is left to the root to manually to enforce his policies.
	If a device is removed, it will not come back after reboot,
	so undelete will have to work on removed as well as
	whiteout'ed devices, effectively making whiteout and unlink
	the same thing.
	You can create symbolic and hard links and directories
	int he DEVFS, and they will be there after reboot.
	Implementing the peristence in the filesystem is messy,
	intricate and will take up significant amounts of code.

	SOME of the issues not addressed in this description:

	* format of persistance database:  ascii file, binary file
	  shadow inodes ?
	* how to manually list/edit the persistance database ? (tradeoff
	  between ascii parsing code in the kernel vs. specialized
	  reading/writing/editing userland tools.)
	* modifying the persistance databse for devices not currently
	  found.  (as above for specialized tools)
	* garbage collecting the persistance database. (ditto)
	* What happens if eg a symlink in the database collides with
	  a newfound device, which entry takes precedence ?
	* If the persistance database lives in a filesystem, how
	  does the kernel locate it at boot time ?


HISTORY:

We have had a DEVFS implementation in the tree for 32 months by
now.  That means from before 2.0.5 was released.  The reason we
still don't have a DEVFS as standard is that this persistence vs.
non-persistence has not been sorted out.  It is high time to
get this thing settled and move on.


OPINION:

My personal preference is to take a non-persistent DEVFS. 

I have never changed the mode or deleted something in a /dev
directory without it being a matter of policy.  I think any such
policy is far better expressed in a shell script run at boot time,
where I can use all the facilities of the shell to implement my
policy.

Having my policy in only one place (unless I myself choose to split
it), in a well known form (shell script), where I can put comments
on it, and even have it under version control makes me feel good.
In particular I like the idea of having wildcard names help make
sure my policy also covers any devices added later in time.

I can trust the contents of /dev to be in a known and well defined
state after a reboot, a state which is conceptually easy to understand
and readable in standard syntax for the user.  No new tools to
learn and know about.

I do not feel as confident this would be the case with a persistent
DEVFS.  I don't like the concept of "shadow databases" expressed
through in pseudoform through another database mechanism.

I would need to be able to edit or at the very least read the
persistence database (sound of agonized cries from AIX users heard
in the background).  How would I edit an entry in the persistance
database for a device I do not currently have in my system ? What
happens if I edit the database and somebody else does a chown at
the same time?  Can I add dormant entries to the database so that
any devices appearing later will be set according to my policy?
Can I use wildcards for it?  It sounds to me like it will be much
harder to implement a policy and enforce it, for a persistent DEVFS.

It is obviously out of the question to implement the full shell
syntax in the kernel, so either we need a special userland process
to translate to and from a standard format, or a special toolset
to list/edit the binary database.  We're in essence talking about
adding another namespace, a prospect that makes removing the
cdevsw/bdevsw namespace pretty pointless in my book.

How about the case where people try out some gadget, forgets about
it for a number of months and buy some other gadget instead which
the same driver recognizes, then sudenly some old stuff appears
out of nowhere which may not even apply to that device, and since
the device is there in the database, not even the device driver
gets a chance to say what it feels about the issue ?  Or even
worse the device was removed so the "new" hardware looks like
it doesn't work because nothing shows up in /dev ?  I'm not
fielding the support line on this issue.

The fact that it is so much simpler to express the functioning
of a non-persistent DEVFS, that so many so thorny issues are
tangled up in the persistent DEVFS, makes me think that any
advantages of a persistent DEVFS (I see none) are run over,
rolled flat, scraped up and thrown out by the Keep It Simple
Principle.

How many people would ever know the difference anyway?  Very few,
I presume.  I think most people stick with the default permissions,
and the few who don't probably know what they are doing.
They will therefore be perfectly capable of getting either of the
two models to do what they want, maybe with the same bias as me
that having a shell script to do it in is both cleaner and easier.

Summary:  I cannot see who in our user community will benefit
from persistance in DEVFS, I don't see what benefits it brings,
and I think it is overly complicated hard to implement right
and errorprone in action.

[My only concern with a non-persistent DEVFS is the permissions on
device nodes that appear due to an event (e.g, a card insertion), and
I think the can be adequately addressed by having a flag for a DEVFS
mount that stop new nodes from automatically appearing in that
instance of DEVFS. -EE]

--
Poul-Henning Kamp             FreeBSD coreteam member
phk@FreeBSD.ORG               "Real hackers run -current on their laptop."
"ttyv0" -- What UNIX calls a $20K state-of-the-art, 3D, hi-res color terminal

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message