Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 Apr 2001 14:23:19 -0400 (EDT)
From:      Robert Watson <rwatson@FreeBSD.ORG>
To:        Matt Dillon <dillon@earth.backplane.com>
Cc:        Julian Elischer <julian@elischer.org>, Poul-Henning Kamp <phk@critter.freebsd.dk>, Kirk McKusick <mckusick@mckusick.com>, Rik van Riel <riel@conectiva.com.br>, freebsd-hackers@FreeBSD.ORG, David Xu <bsddiy@21cn.com>
Subject:   Re: vm balance
Message-ID:  <Pine.NEB.3.96L.1010418141424.2462O-100000@fledge.watson.org>
In-Reply-To: <200104181811.f3IIBJp25644@earth.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Wed, 18 Apr 2001, Matt Dillon wrote:

>     If a device or file can be mmap()'d, then the VM Object acts as the
>     cache layer for the object.  We would in fact be able to remove nearly
>     *ALL* the caching crap from *ALL* the filesystem code.  Filesystem
>     code would be responsible for low level I/O operations and meta ops
>     (VOPs) only and not be responsible for any caching of file data.  The
>     filesystem would still potentially be responsible for caching things
>     like bitmaps and such, but it could use a struct file for the backing
>     device and get it for free (the backing device is mmapable and thus
>     would have a VM Object layer, so you get the bitmap caching for free).

Does this give you a cache coherence problem if the file system itself
invokes data writes on files?  Consider the UFS quota and extended
attribute cases: here, the file system will invoke VOP_WRITE() on its
vnodes to avoid understanding file system internals, so you can have such
operations shared across file systems using UFS.  If there is caching
happening above VOP_WRITE(), will changes get propagated up the stack?  Or
does VOP_WRITE() change so that it talks to the memory object which then
talks to VOP_REALLYWRITE()?

Also, what implications does this have for security-oriented revocation? 
Memory mapping has always been a problem for revocation, but a number of
interesting pieces of work have been done wherein access to a file is
revoked resulting in EPERM being returned from future reads.  In fact, I
believe Secure Computing even contracted with BSDI to have support for
some sort of virtual memory revocation service to get written -- in MAC
environments, a label change on a file can result in future operations
failing.  Many third party security extensions on various platforms
implement some sort of revocation service -- while it hasn't been part of
the base OS in many cases, this is still a relevant audience.

Also, however this is implemented, it would be nice to consider supporting
stateful access to devices: i.e., dev_open() returns a state reference
that is fed into future operations, so that pseudo-devices emulating
multi-instance devices from other platforms can operate correctly.  In my
mind, for this to work with file descriptor passing, either the open file
record needs to hold the state, and be passed into operations (this is
what Linux does -- all file system operations accept a open file entry
pointer, allowing vmmon, for example, to determine which session is in
use), or we need a more general state management technique.  In any case,
one thing this means is that if operations are pushed through a virtual
memory object, different "instances" must have different objects...

I may be off-base on some points here based on a lack of expertise on the
device and vm sides, but my feeling is that there are a lot of
implications to this type of change, and we want to be careful not to
preclude a number of potential future development directions, especially
when it comes to security work and emulation.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Project
robert@fledge.watson.org      NAI Labs, Safeport Network Services


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1010418141424.2462O-100000>