From owner-freebsd-hackers  Wed Nov 13 10:05:45 1996
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id KAA02520
          for hackers-outgoing; Wed, 13 Nov 1996 10:05:45 -0800 (PST)
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211])
          by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id KAA02495;
          Wed, 13 Nov 1996 10:05:32 -0800 (PST)
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id KAA22484; Wed, 13 Nov 1996 10:54:58 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199611131754.KAA22484@phaeton.artisoft.com>
Subject: Re: NFS bypass op and the utok layer
To: michaelh@cet.co.jp (Michael Hancock)
Date: Wed, 13 Nov 1996 10:54:58 -0700 (MST)
Cc: Hackers@freebsd.org, freebsd-fs@freebsd.org
In-Reply-To: <Pine.SV4.3.95.961113233409.14783F-100000@parkplace.cet.co.jp> from "Michael Hancock" at Nov 13, 96 11:49:43 pm
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

Boy, people keep asking questions for which my work is the answer...
this is more than a little cool.  8-).


> Were these even considered when the FreeBSD vnode stacking implementation
> was done? 
> 
> The NFS default op is the one returning the NOT SUPPORTED error.  A bypass
> op would allow you to stack on top of an out-of-kernel layer which could
> then be layered on a utok layer to cross the boundary again.
> 
> I guess the fs memory allocation architecture is not compatible with this.

You have hit the nail on the head.

There are many places where the FS is expected to allocate something
which it will never deallocate, or deallocate something which it did
not allocate.  Examples include:

o	The vfs_syscalls.c generated namei cn_pnbuf
o	The NFS generated namei cn_pnbuf
o	The vnode

In addition, there are many places where the VOP's are not abstracted
by status return (ie: they are call-down instead of veto interfaces).
Examples include:

o	VOP_LOCK
o	VOP_ADVLOCK
o	VFS_MOUNT
	o	NFS export list porcessing
	o	root mount processing
	o	remount processing
	o	mount point covering
o	namei()
	o	CREATE op in EXISTS case with no intention of
		overrwrite in the case of collision


Without a clear abstraction, it's impossible to build a utok/ktou
layer (I would prefer a ktou to a bypass op; it's more general, and
doesn't require an NFS loopback).

Particularly problematic are the NFS LEASE VOP's, which are interfaced
by a serious kludge because they are call-down instead of veto, and
therefore can not be zero-overhead registration based.  If my changes
for fcntl() to support server-side NFS locking (as the subsystem called
by rpc.lockd) are ever integrated, this will add another, identical
kludge for FHTOVP for an NFS LKM.

> Debugging in userland would sure be cool, when you're satisfied take away
> the transport layers and you're back in the kernel.

This was discussed in detail in the Heidemann paper, actually... and
yes, it's the way I'd like to do FS debuging as well.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.