Date: Wed, 18 Apr 2001 12:29:50 -0400 (EDT) From: Robert Watson <rwatson@FreeBSD.ORG> To: Poul-Henning Kamp <phk@critter.freebsd.dk> Cc: Kirk McKusick <mckusick@mckusick.com>, Julian Elischer <julian@elischer.org>, Rik van Riel <riel@conectiva.com.br>, freebsd-hackers@FreeBSD.ORG, Matt Dillon <dillon@earth.backplane.com>, David Xu <bsddiy@21cn.com> Subject: Re: vm balance Message-ID: <Pine.NEB.3.96L.1010418115558.2462H-100000@fledge.watson.org> In-Reply-To: <38778.987605692@critter>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 18 Apr 2001, Poul-Henning Kamp wrote: > I have not examined the full details of doing the shift yet, but it is > my impression that it actually will reduce the amount of code > duplication and special casing. .. > The only places we will need new magic is > open, which needs to fix the plumbing for us. > mmap, which may have to be added to the fileops vector. > > The amount of special-casing code this would remove from the vnode > layer is rather astonishing. > > If we merger vm-objects and vnodes without taking devices out of the > mix, we will need even more special-case code for devices. Let me expand a bit on what I want to object to, and then comment a bit on what I have mixed feelings about but am not actively objecting to. I believe it is necessary to retain a reference to the vnode used to access the device in f_data, and an f_type of DTYPE_VNODE. This is used with tty's extensively, where it is desirable to open /dev/ttyfoo and then perform file system operations on it, such as fchflags(), fchmod(), fchown(), revoke(), et al, and relies on reaching the vnode via the open file entry associated with the file descriptor designated by the invoking process. This behavior is needed for a variety of race-free operations at login, et al. Changing this would require *extensive* modification to the syscall service layer (that is, what sits above VFS). Assuming the modifications were made so that the fileops array provided these services (makine the struct file be the entire abstraction, hiding VFS from the system call service layer) you've now completely rewritten the large majority of system calls, as well as introduced a whole ne category of inter-abstraction synchronization that must occur when a change is made to any abstraction (i.e., adding ACLs, MAC, ...). So it seems to me that access to the vnode must be maintained in struct file, that we cannot totally replace references to the vnode with references to, for example, the device abstraction. So with these assumptions in place, it's still possible to consider what you were suggesting: replacing the vnode fileops array with a device fileops array, so that these calls would be short-cutted directly to the device abstraction rather than passing through the VFS abstractions on the way. In some ways, this makes sense: many of the device services map poorly into the file-like abstraction of the vnode. For example, devices may have a notion of a stateful seeking position: tape drives, for example, really *do* seek to a particular location where the next read or write must be performed. Similarly, some devices really do act like streaming data sources or sinks: especially with regards to pseudo-devices, they may behave much more like sockets, with a notion of a discrete transmission unit, a maximum transmission unit, or addressibility (imagine if you could open a device representing a bus, and use socket addressing calls to set the bus address being targetted -- say for a /dev/usb0, you could say "address the following messages to USB address 4", or being able to open /dev/ed0, set the target address of the device instance to an ethernet address, and send). We already have this problem to some extent with sockets: we use the file system vnode for two purposes: first, as a namespace in which to identify the IPC object, and second, as a means for storing protection properties. It's arguable that devices might work that way also, which I think is what you're asserting. I'm not strictly opposed to this viewpoint, but it begins to make me wonder a bit about the current structuring of that whole section of the kernel: to me, a vnode really does seem like a decent abstraction of the file system concept. The socket seems like a less decent abstraction of the IPC concept, but a better abstraction of a send/receive stream. This is all complicated by long-standing interfaces and notions about how the abstractions are to be used. I guess I'd rather see it look something like this: +-----------------+ | file descriptor | +-------+---------+ | +-----------+-------------+ | kernel object reference | +-----------+-------------+ | +---------------+-----------------+ | | | vfile kqueue vstream | +--------+------+--+--------+ IPC Socket FIFO Pipe Stream Device (note the above, and below, are highly fictional) Where "kernel object reference" is the equivilent of today's "struct file", "vfile" is the equivilent of today's "vnode", and "vstream" is a new abstraction for discrete or streamed, ordered, message/event-oriented services. Devices might choose to appear as a file-like service, offering an ordered data address space where all points of the address space have fairly similar properties, provide a memory mapping service (possibly a generic vfile pager), data can be read or written arbitrarily, and so on. They could also choose to appear as a stream-oriented service which would offer send/receive primitives, possibly as a stream with discrete message boundaries, with addressing management, etc. Ideally, I'd actually rather kqueue fit under an abstraction like that, although it's currently a first-class object. You could imagine: struct kernel_object { struct vnode *ko_vp; /* Optional vnode that provided * access to the object. */ int ko_type; /* Which service abstraction. */ union { struct vfile *kso_vfile; struct vstream *kso_vstream; struct kqueue *kso_kqueue; } ko_service; }; The optional vnode (possibly NULL) is maintained so that the caller can perform file-system f* operations on the file descriptor pointing at the object, but wouldn't apply for things like pipe's, where there is no file system object. Presumably most operations would go either to the ko_vp, or to the ko_service; some might be propagated to both, such as open and close operations. Another thing to keep in mind, btw, is that security services are poorly divided between the device system and file system right now. File system permissions are applied on device open, and used by many consumers -- in fact, one cool thing about using BPF with a /dev/bpf is being able to give out read/write access to unprivileged. This doesn't work for a number of devices, which enforce their own protections in the open operation... Robert N M Watson FreeBSD Core Team, TrustedBSD Project robert@fledge.watson.org NAI Labs, Safeport Network Services To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1010418115558.2462H-100000>