Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Jun 2002 14:51:32 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Harry Newton <harry_newton@telinco.co.uk>
Cc:        freebsd-hackers@freebsd.org
Subject:   Re: status of portalfs
Message-ID:  <3D179464.4538513D@mindspring.com>
References:  <86adpkih9j.fsf@basilisk.locus>

next in thread | previous in thread | raw e-mail | index | archive | help
Harry Newton wrote:
> LINT says that the portal filesystem is 'known to buggy'. I'm having a
> look at it, but can't get it to break ! Could someone give me an idea
> of the status of it, or where I could look ?

From the man page of "mount_portal":

     The portal daemon provides an open service.  Objects opened under the
     portal mount point are dynamically created by the portal daemon according
     to rules specified in the named configuration file.  Using this mechanism
     allows descriptors such as sockets to be made available in the filesystem
     namespace.

This basically introduces the same cache coherency problems
with read/write/mmap that you will have with any FS stacking,
by way of the lack of explicit coherency notification having
been removed as part of the switch to the unified VM and buffer
cache code, a long time ago.

THe problem is that there is a backing object behind your
backing object, so there are two vm_object_t's that refer
to the same data living in different pages (one in the upper
vnode, one in the lower vnode -- either in a stack, or in
the underlying FS which the descriptor being exported is a
reference into).

This can't be easily resolved with the introduction of an
alias vm_object_t; in the first place, they are not reference
counted, and in the second place, modification notifications
are done by (effective) reverse lookup, which means that if
I have A and B with a pointer to the same mapping, the VM
system will only notify one, not both.  If you need an "in
the third place", it's that one of the most common bugs that
had to be beaten out of the unified VM and buffer cache code
was that of unintentional aliasing.  Intentional aliasing
would make such problems utterly impossible to locate or
debug.  So even if you could make the objects point at the
same pages, you'd lose things like copy-on-write faults,
which should be shared.

[ Technically, the read/write operations in any stacked FS
  should really be implemented in terms of getpages/putpages
  in the implementation FS, but this really not there, and
  wouldn't work anyway, because of the notification problem ]

Unfortunately, "msync" isn't going to do what you want.

One option is to make sure that when you mmap the portalfs
space, that the mmap is implemented with read/write.  THis
would involved a change to the "default VOPS" {get|put}page
operations.  It's also very ugly because of where it does and
doesn't work.  But this was the approach taken in the "nullfs"
implementation, to mask the coherency problem, rather than
correcting it.  In theory, the "nullfs" should be able to use
the referenced VM objects of the underlying FS directly, since
it does not attempt non-linear remapping of address ranges,
but in reality, the vm_object_t's are referenced directly out
of the vnode pointer objects, rather than using an accessor
function in upper level code, so that the access could be
trapped an redirected.

Mapping {get|put}pages into underlying read/write is actually
exactly backwards, and kind of fossilizes incorrect behaviour
into permanency, but it works without a reorganization of the
VFS code.

The upshot of this is that you can use it as it is, so long
as you don't use any operations that operate on the backing
objects via the VM system, and expect coherency (e.g. no
mmap, no sendfile, etc.).

>  - Harry, who apologises if this isn't the right list.

PRobably "FreeBSD-FS"... though it's also VM related.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D179464.4538513D>