Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 6 Aug 1996 16:50:33 +0100 (BST)
From:      Doug Rabson <dfr@render.com>
To:        Terry Lambert <terry@lambert.org>
Cc:        Michael Hancock <michaelh@cet.co.jp>, jkh@time.cdrom.com, tony@fit.qut.edu.au, freebsd-fs@freebsd.org
Subject:   Re: NFS Diskless Dispare...
Message-ID:  <Pine.BSI.3.95.960806163307.10082P-100000@minnow.render.com>
In-Reply-To: <199608051859.LAA11723@phaeton.artisoft.com>

next in thread | previous in thread | raw e-mail | index | archive | help

[moved to freebsd-fs]

On Mon, 5 Aug 1996, Terry Lambert wrote:

> What I'm suggesting is that there needs to be both a VFS_VGET and
> a VFS_VPUT (or VFS_VRELE).  With the additional per fs release
> mechanism, each FS instance can allocate an inode pool at its
> instantiation (or do it on a per instance basis, the current
> method which makes inode allocation so slow...).

Not really sure how this would work for filesystems without a flat
namespace?  VFS_VGET is not supported for msdosfs, cd9660, nfs and
probably others.

> 
> Consider UFS: the in core inode struct consists of a bunch of in core
> data elements (which should probably be in their own structure) and
> a "struct  dinode i_din" for the on disk inode.
> 
> You could modify this as:
> 
> struct inode {
> 	struct icinode	i_ic;		/* in core inode*/
> 	struct vnode	i_iv;		/* vnode for inode*/
> 	struct dinode	i_din;		/* on disk inode*/
> };
> 
> 
> Essentially, allocation of an inode would allocate a vnode.  There
> would never be an inode without a vnode.
> 
> 
> The VFS_VPUT would put the vnode into a pool maintained by the
> FS per fs instance (the in core fs structure would need an
> additional structure element to point to the maintenance data).
> 
> The FS itself would use generic maintenance routines shared by
> all FS's... and capable of taking a structure size for i_ic and
> i_din element size variations between FS types.  This would
> maintain all common code in the common interface.
> 
> 
> The use of the vget to associate naked vnodes with the FS's would
> go away; in no case is a naked vnode ever useful, since using vnode
> buffer elements requires an FS context.
> 
> 
> In effect, the ihash would become a vnhash and LRU for use in
> reclaiming vnode/inode pairs.  This would be much more efficient
> than the current dual allocation sequence.
> 
> 
> This would allow the discard of the vclean interface, and of the
> lock used to ensure it operates (a lock which has to be reimplemented
> and reimplemented correctly on a per FS basis in the XXX_LOCK and
> XXX_UNLOCK FS specific routines).

Wait a minute.  The VOP_LOCK is not there just for vclean to work.  If you
took it out, a lot of the VOPs in ufs would break due to unexpected
reentry.  The VOP_LOCK is there to ensure that operations which modify the
vnode are properly sequenced even if the process has to sleep during the
operation.

> 
> 
> The vnode locking could then be done in common code:
> 
> 
> vn_lock( vp, flags, p)
> struct vnode *vp;
> int flags;
> struct proc *p;
> {
> 	/* actual lock*/
> 	if( ( st = ...) == SUCCESS) {
> 		if( ( st = VOP_LOCK( vp, flags, p)) != SUCCESS) {
> 			/* lock was vetoed, undo actual lock*/
> 			...
> 		}
> 	}
> 	return( st);
> }
> 
> 
> The point here is that the lock contention (if any) can be resolved
> without ever hitting the FS itsef in the failure case.
> 

You can't do this for NFS.  If you use exclusive locks in NFS and a
server dies, you easily can end up holding onto a lock for the root vnode
until the server reboots.  To make it work for NFS, you would have to make
the lock interruptable which forces you to fix code which does not check
the error return from VOP_LOCK all over the place.

I hope we are not talking at cross purposes.  We are talking about the
vnode lock, not the advisory record locking aren't we?

--
Doug Rabson, Microsoft RenderMorphics Ltd.	Mail:  dfr@render.com
						Phone: +44 171 251 4411
						FAX:   +44 171 251 0939




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSI.3.95.960806163307.10082P-100000>