Date: Mon, 24 Jun 2002 14:51:32 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: Harry Newton <harry_newton@telinco.co.uk> Cc: freebsd-hackers@freebsd.org Subject: Re: status of portalfs Message-ID: <3D179464.4538513D@mindspring.com> References: <86adpkih9j.fsf@basilisk.locus>
next in thread | previous in thread | raw e-mail | index | archive | help
Harry Newton wrote: > LINT says that the portal filesystem is 'known to buggy'. I'm having a > look at it, but can't get it to break ! Could someone give me an idea > of the status of it, or where I could look ? From the man page of "mount_portal": The portal daemon provides an open service. Objects opened under the portal mount point are dynamically created by the portal daemon according to rules specified in the named configuration file. Using this mechanism allows descriptors such as sockets to be made available in the filesystem namespace. This basically introduces the same cache coherency problems with read/write/mmap that you will have with any FS stacking, by way of the lack of explicit coherency notification having been removed as part of the switch to the unified VM and buffer cache code, a long time ago. THe problem is that there is a backing object behind your backing object, so there are two vm_object_t's that refer to the same data living in different pages (one in the upper vnode, one in the lower vnode -- either in a stack, or in the underlying FS which the descriptor being exported is a reference into). This can't be easily resolved with the introduction of an alias vm_object_t; in the first place, they are not reference counted, and in the second place, modification notifications are done by (effective) reverse lookup, which means that if I have A and B with a pointer to the same mapping, the VM system will only notify one, not both. If you need an "in the third place", it's that one of the most common bugs that had to be beaten out of the unified VM and buffer cache code was that of unintentional aliasing. Intentional aliasing would make such problems utterly impossible to locate or debug. So even if you could make the objects point at the same pages, you'd lose things like copy-on-write faults, which should be shared. [ Technically, the read/write operations in any stacked FS should really be implemented in terms of getpages/putpages in the implementation FS, but this really not there, and wouldn't work anyway, because of the notification problem ] Unfortunately, "msync" isn't going to do what you want. One option is to make sure that when you mmap the portalfs space, that the mmap is implemented with read/write. THis would involved a change to the "default VOPS" {get|put}page operations. It's also very ugly because of where it does and doesn't work. But this was the approach taken in the "nullfs" implementation, to mask the coherency problem, rather than correcting it. In theory, the "nullfs" should be able to use the referenced VM objects of the underlying FS directly, since it does not attempt non-linear remapping of address ranges, but in reality, the vm_object_t's are referenced directly out of the vnode pointer objects, rather than using an accessor function in upper level code, so that the access could be trapped an redirected. Mapping {get|put}pages into underlying read/write is actually exactly backwards, and kind of fossilizes incorrect behaviour into permanency, but it works without a reorganization of the VFS code. The upshot of this is that you can use it as it is, so long as you don't use any operations that operate on the backing objects via the VM system, and expect coherency (e.g. no mmap, no sendfile, etc.). > - Harry, who apologises if this isn't the right list. PRobably "FreeBSD-FS"... though it's also VM related. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D179464.4538513D>