FreeBSD Mail Archives

Date:      Sun, 24 Mar 1996 12:13:31 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        adam@veda.is (Adam David)
Cc:        freebsd-current@FreeBSD.org
Subject:   Re: async mounts, etc.
Message-ID:  <199603241913.MAA10046@phaeton.artisoft.com>
In-Reply-To: <199603240248.CAA00668@veda.is> from "Adam David" at Mar 24, 96 02:48:07 am

> This reminds me of another similar problem with an NFS mounted from localhost
> with the union option, it happened a few months back. After many weeks of
> regular operation, it crashed when some limit was exceeded. After this, every
> attempt to mount the upper layer would cause an instant panic. There was not
> enough diskspace at the time to take a crashdump, and I have not dared test
> it since. I used a bunch of symbolic links in the end, instead of the lower
> layer.
> 
> Oh, and that reminds me to ask: Just how broken is nullfs? Is it lying broken
> because it depends on other stuff not yet settled, or is it just waiting to be
> fixed?


I haven't fixed either the unionfs or the nullfs.

Or more specifically, I haven't made the fixed code available because
it wouldn't do any good without my framework changes, as well as the
4.4Lite2 code, plus some changes to that that I have not done yet.


The main problem in any overlay FS is the VOP_LOCK/VOP_UNLOCK code, which
had been complicated beyond belief.  The unionfs, the nullfs, and the
uid mapping, compression, and quota FS's (the latter 3 in prototype
form) all rely on the ability to stack the file systems.

Unfortunately, the stacking interface layering is, for want of a better
term, "broken".

With a unionfs, for instance, if I want to do a VOP_LOCK, I *should*
VOP_LOCK the underlying vnodes from all FS's involved in the mount;
but since the vnodes and inodes can be seperated from each other,
there has to be an interaction with the in core inode.

In the 4.4Lite2 code, this is somewhat abstracted by the "lockmgr"
routine, but 4.4Lite2 is still broken; there is a comment in the
unionfs code to that effect.

The problem is that they are trying to call down the locking to the
lockmgr() routine in their design.  This is, in fact, inherently
flawed for most file systems, since it introduces race conditions.


The real fix would be to acquire a vp lock at a higher level, then
call down to the per FS VOP_LOCK (which in most FS module
implementations becomes nothing more than "return( 0);").

If the VOP_LOCK vetos the lock, then the vp lock is released and the
error is propagated back up; otherwise, the lock is acquired and the
routine returns.

This allows the vp references to multiple underlying vp's in an
overlay FS to be locked sperately and individually; this is where
a call into the vn_lock code would properly belong in the FS.


The final remaining problem is the reverse link problem; this is
more an issue of computing transitive closure between an internal
use of locking in an FS: remember that once you have traversed
from a unionfs to an underlying FS, the underling FS is operating
on a vp for itself and not for the union FS that got there.

Typically, this will not be a problem if thread/instance identifier
recursive locking (PID, in current implementations that don't allow
aioread/aiowrite or don't implement kernel threads) is used, since
locks in that case are entrancy counting semaphores.  The big issue
in that case become lock release on siginterrupt, and that's already
properly layered.


I freely admit to the possibility that there are other was to fix
this than those I have suggested; for instance, it is still *very*
useful to have a vnode/offset buffer cache mapping, even if you do
implement device/offset cache mapping to avoid throwing away perfectly
good cache blocks on vnode reuse (which we currently do).  This is
useful because it avoids a bmap (expensive operation) on every call.

This could be resolves by allocating the vnode in the inode structure
instead of as a seperate entity that can be seperately recycled; the
vnode would become dependent on FS type for the pool to which it would
be returned (ie: you would need to ad a per FS VOP for discarding
vnodes).

For a "union_node" (to get back to unionfs), the vnode would be
allocated there.

I still argue that, however it is done, the problem *should* be
resolved.  Getting rid of the useless (because the buffers have been
invalidated by vnode dissociation in the current code) second chance
caching needs to happen.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199603241913.MAA10046>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation