From owner-freebsd-hackers  Thu Jul 27 13:13:21 2000
Delivered-To: freebsd-hackers@freebsd.org
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by hub.freebsd.org (Postfix) with ESMTP
	id BC3A737BD63; Thu, 27 Jul 2000 13:13:15 -0700 (PDT)
	(envelope-from robert@fledge.watson.org)
Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3])
	by fledge.watson.org (8.9.3/8.9.3) with SMTP id QAA10252;
	Thu, 27 Jul 2000 16:13:08 -0400 (EDT)
	(envelope-from robert@fledge.watson.org)
Date: Thu, 27 Jul 2000 16:13:08 -0400 (EDT)
From: Robert Watson <rwatson@freebsd.org>
X-Sender: robert@fledge.watson.org
To: Isaac Waldron <waldroni@lr.net>
Cc: freebsd-hackers@freebsd.org, freebsd-arch@freebsd.org
Subject: Re: Writing device drivers (ioctl issue)
In-Reply-To: <005301bff73b$bf8a3460$0100000a@waldron.house>
Message-ID: <Pine.NEB.3.96L.1000727155220.96611E-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Wed, 26 Jul 2000, Isaac Waldron wrote:

> I started working on a port of FreeMWare/plex86 (www.plex86.org) to FreeBSD
> yesterday, and have run into a small problem.  The basic idea is that I need
> to write a kernel module that implements some ioctls for a new psuedo-device
> that will eventually reside at /dev/plex86.
> 
> The issue I'm running into is with the function I'm writing to handle the
> ioctls for the device.  For one of the ioctls, the code needs to get some
> data from the file descriptor that was passed to the original call to
> ioctl(2).  This is easily accomplished in linux, because the file descriptor
> is passed as the second argument to the device_ioctl function.
> 
> Is there an easy way to get at the same data (the file descriptor passed to
> ioctl(2) by the calling program, in a kernel-style "struct file *", not the
> standard "struct FILE *") in FreeBSD?  Or will it be neccesary to change the
> ioctl structure slightly and therefore need to change some of the higher
> level functions in plex?

I ran into this same problem when modifying the vmmon VMWare driver for
FreeBSD to support mulitple emulator instances.  FreeBSD's VFS does not
have a concept of stateful file access: there are open's and close's, but
the VOP_READ/WRITE operations are not associated with sessions.  This
influences the way in which drivers are implemented for BSD (and
platforms like it.)  For example, rather than having one /dev/bpf with
multiple "open" instances, we have /dev/bpf{0,1,...}, and a process
needing a session will sequentially attempt to open devices until it finds
one that doesn't return EBUSY.  The driver, in this case, limits the
number of open references to 1.

There are a number of possible solutions to this problem, including the
Linux solution solution of passing the file descriptor down the VFS stack
so that VFS layers can attach attach information to the file descriptor
providing session information.  In this manner, VOP_READ/WRITE can
determine which session is active, and behave appropriately.  I dislike
this solution: right now, file descriptors are a property of the process
and ABI, and the VFS is unaware of them.  Having a stacked file system
suggests that the single hook in the file descriptor is insufficient to
maintain per-layer information associated with a session, also.  It also
makes a mess of access to files from within the kernel, where file
descriptors are not used.

My preferred solution, and I actually hacked around with a kernel a bit to
do this, is to make the VFS provide (optional) stateful vnode sessions.
vop_open() would gain an additional call-by-reference argument, probably a
void**.  When NULL, the caller would be requesting a stateless vnode open,
and all would be as today.  When non-NULL, this would allow the vnode
provider to return a cookie/rock/void pointer to state information for the
session.  Other VOP's would similarly accept back this cookie, allowing
the VOP provider to inspect it (if non-NULL) and behave appropriately with
state.  vop_close() could be used to release the cookie.

This would provide the ability for file systems and callers to optionally
make use of state, without violating the seperation of file
descriptors/open file records and the VFS.  It would also allow stacking
to occur, as each vnode private data layer/layered cookie struct could do
appropriate layer transformations to get the right cookie for the next
layer down.  I.e., there would be a sensical semantic for stacked file
systems to provide stateful access.

My changes are incomplete as I was working on it on the plane, and
comments on the idea would be welcome.  One thing this would allow is for
us to not heavily replicate device nodes in /dev for multi-instance
virtual devices.  The BPF example is a useful one here: while the kernel
currently supports dynamically allocated BPF devices, /dev has to have
BPF entries manually added.  The same goes for tunnel devices, et al.
While a real devfs would fix this, the semantic is also useful for drivers
ported from Linux (and other platforms with stateful vnode access) that
expect to be able to open /dev/vmmon, and get a new unique session.  For
/dev/vmnet, it means the driver can detect multiple sessions on the same
device, and act appropriately.  In vmnet, each vmnet device acts like an
ethernet bridge for the sessions open on it, so you can bind different
VMWare sessions to different virtual network segments, potentially more
than one VMWare session to each network segment.

Or, you can binary-modify VMWare each run time to open a different
/dev/{whatever}, or ge the developer to use a /dev/whatever{0,1,2,3...}
model, for which there is much precedent in Linux (BPF, ttys, etc).

  Robert N M Watson 

robert@fledge.watson.org              http://www.watson.org/~robert/
PGP key fingerprint: AF B5 5F FF A6 4A 79 37  ED 5F 55 E9 58 04 6A B1
TIS Labs at Network Associates, Safeport Network Services


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message