From owner-freebsd-current  Thu Nov  2 17:58:59 1995
Return-Path: owner-current
Received: (from root@localhost)
          by freefall.freebsd.org (8.6.12/8.6.6) id RAA01619
          for current-outgoing; Thu, 2 Nov 1995 17:58:59 -0800
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211])
          by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id RAA01613
          ; Thu, 2 Nov 1995 17:58:47 -0800
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id SAA03367; Thu, 2 Nov 1995 18:57:30 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199511030157.SAA03367@phaeton.artisoft.com>
Subject: RFD: VFS, non-Intel architectures
To: arch@freebsd.org
Date: Thu, 2 Nov 1995 18:57:29 -0700 (MST)
Cc: terry@phaeton.artisoft.com (Terry Lambert), hackers@freebsd.org,
        current@freebsd.org
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 8679      
Sender: owner-current@freebsd.org
Precedence: bulk

REQUEST FOR DISCUSSION

As some of you know, I am doing a port of FreeBSD to the Motorolla
Ultra 603/604.

----------------------------------------------------------------------
Rationale:

Part of a port to a different platform has always been support of
the native file system for that platform.


The native OS for the machine I am using for the port is AIX.

In order to support the local file system, two things must be done:

1)	A JFS kernel file system based on VFS must be written

2)	Boot code that can load the kernel from a JFS partition
	must be written.


It would aid me greatly in writing the JFS file system if the VFS
interface matched its documentation.  This is a primary concern
for all VFS based file system authors.

It would aid me greatly if it were possible to share code between the
boot and the kernel file systems.  This is a secondary concern, since
I can safely use the PReP psecification for pre-Open Firmware boot code
and use the IBM boot code.  It is not practical to use the IBM code
for a distribution, however, since IBM owns it and it requires an AIX
license.  A distribution version of the port will likely not use JFS
as its native file system, also lessoning the secondary concern.


To resolve my primary concern, I would like to modify the VFS interface
code in the kernel (all code that makes VFS calls) to conform to the
design criteria for the 4.4BSD-Lite stackable file system interface.

Note that this concern is an issue for anyone attempting to use
the 4.4BSD-Lite stackable file system interface for platform
ports, and for people attempting to use the code in non-4.4BSD-Lite
derived systems.  Like Linux and Windows95.

This interface is documented in John Heidemann's Master's thesis,
which can be obtained by interested persons from ftp.cs.ucla.edu.

The FICUS file system documentation available from the same site is
also applicable.


It is clear that Chris Torek's importation of the Heidemann code for
the 4.4BSD-Lite release was a quick hack job.

Both Garrett Wollman, and later myself, have made patches to Chris
Torek's code to cause it to be more in line with John Heidemann's
intent.


Proposal:
----------------------------------------------------------------------

Here are the layering/framework issues I would like to address
initially:


1)	The vfs_opv_numops global is a count of the total number of
	per interface vop descriptors in the vfs_op_descs[] array
	in the machine generated file vnode_if.c.  The effect of
	having the count occur in the vfs_init.c code instead of
	as a static initialization is to make the framework depend
	on having at least one static entry.

	---
	I would like to replace the count mechanism with a static
	initialization mechanism based on the machine generation
	mechanism.  Specifically, I'd like vnode_if.sh to add the
	following line:

int vfs_opv_numops = sizeof(vfs_op_descs)/sizeof(struct vnodeop_desc *) - 1;

	to the end of vnode_if.c, and an extern reference to vnode_if.h,

	Function will be identical before and after the patches.

	The intent of this change is to allow the operating system
	to act correctly when going for 0->1 static file system
	definitions.

	This is both of obvious utility to cross platform porters in
	not having to get a file system operational as a minimal
	porting requirement (allowing the port to be staged), and
	brings the file system closer into line with the Heidemann
	document.

	This change currently exists as part of my "fs layering patches"
	in ~terry on freefall.cdrom.com.


2)	The interface between the VFS framework and the file system
	instances incorrectly makes assumptions that the design
	document does not allow it to make.

	a)	The underlying file system is assumed to free the
		path name buffer created by the file name lookup
		mechanism in vfs_lookup.c,

		This assumption constitutes common, undocumented
		state which must be reimplemented in each underlying
		file system.

		There are currently several coding errors in the
		underlying file systems that will cause failures
		in rare circumstances.  These are a direct result
		of incorrect reimplementation of the assumed state
		manipulations regarding the path name buffer.

		---
		I would like to remove the assumption of a BSD
		dependent lookup mechanism.  I would accomplish
		this by moving the cn_pnbuf free into an inverse
		routine called "nameifree" in vfs_lookup.c.

		Function will be identical before and after the
		patches.

		This would simultaneously simplify the task of
		writing a new file system (by removing the state
		assumptions), clean up the state errors in the
		failure mode cases, and lessen the dependency on
		a BSD based path component name lookup mechanism,
		making the code more portable to non-BSD environments
		using dissimilar lookup mechanisms (like Linux).

		Note: this would require concommitant changes to
		the locations where namei() is called and asked to
		preserve the path so that such implied state
		could be easily backed out.  The major offender
		in this case is vfs_syscalls.c,

		Note: this would require changes to the nfs_subs.s
		and dependent modules, since nfs implements its own
		lookup mechanism (called nfs_namei), and so will
		need a parallel backoff mechanism (nfs_nameifree).

		All of these changes exist as part of my "fs layering
		patches" in ~terry on freefall.cdrom.com.


	b)	The underlying file system makes calls back to the
		VFS implementation layer to implement advisory file
		locking.  This is in violation of the file system,
		layering documentation.  It also poses problems
		for the support of NFS locking, and constitutes an
		undue complication for a file system author (or
		OS porter acting as a native file system author).

		---
		I would like to move the file system advisory locking
		to make use of a two stage lock commit process.  The
		logical explanation of this process is as follows:

		i)	A lock assertion is done locally at the VFS
			and system call interface layer.
		ii)	A lock assertion is made to the underlying
			file system(s).
		iii)(1)	If the lock is not "vetoed" by the underlying
			file system, the assertion is allowed.
		iii)(2)	If the lock is vetoed by the underlying file
			system, the local lock assertion is reverted
			locally and the assertion is disallowed.

		For lock deassertion, the order of local/underlying is
		reversed.

		Function will be identical before and after the
		patches.

		This has the net effect of simplfying the advisory
		locking code in the per file system implementation,
		since most file systems will be using the local
		locking pardigm.  It also moves the POSIX semantics
		into the OS, where they belong.

		The resulting advisory lock structures are to be hung
		from the vnode instead of the per file system inode,
		further generalizing their applicability.

		Finally, it simplifies the case of a stacked or
		union file system mount, since underlying cases
		result in one or more calls, the failure of which
		causes a veto to be returned to the local call
		layer.

		These change are "code in developement".


	c)	The underlying file system makes assumptions about
		the in core inode structure.  File locks are made
		on a per file system basis.  This makes them
		subject to the same issues as advisory locking
		with regard to portability, code reuse, and file
		system authoring as advisory locks.

		---
		I would like to move the file system file locking
		to make use of a similar multistage commit process
		as that described for the advisory locking.  The
		reasoning is identical.

		The resulting file lock structures are to be hung
		from the vnode instead of the per file system inode,
		further generalizing their applicability.

		Function will be identical before and after the
		patches.

		These change are "code in developement".


	d)	The underlying file system is expected to imply
		retention of the path name buffer when doing a
		lookup for a rename/mkdir/create operation.

		This is an assumption of state.

		---
		I would like to move the assumption of state out of
		the per file system implementation case to simplify
		the implementation of additional file systems during
		OS porting, or otherwise.

		This will have the same effect with regard to bringing
		the file system framwork into compliance with the
		Heidemann documentation.

		Function will be identical before and after the
		patches.

		These change are "code in developement".


I would like to invite discussion on these proposed VFS changes.

					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.