From owner-freebsd-current Thu Nov 2 17:58:59 1995 Return-Path: owner-current Received: (from root@localhost) by freefall.freebsd.org (8.6.12/8.6.6) id RAA01619 for current-outgoing; Thu, 2 Nov 1995 17:58:59 -0800 Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id RAA01613 ; Thu, 2 Nov 1995 17:58:47 -0800 Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id SAA03367; Thu, 2 Nov 1995 18:57:30 -0700 From: Terry Lambert Message-Id: <199511030157.SAA03367@phaeton.artisoft.com> Subject: RFD: VFS, non-Intel architectures To: arch@freebsd.org Date: Thu, 2 Nov 1995 18:57:29 -0700 (MST) Cc: terry@phaeton.artisoft.com (Terry Lambert), hackers@freebsd.org, current@freebsd.org X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 8679 Sender: owner-current@freebsd.org Precedence: bulk REQUEST FOR DISCUSSION As some of you know, I am doing a port of FreeBSD to the Motorolla Ultra 603/604. ---------------------------------------------------------------------- Rationale: Part of a port to a different platform has always been support of the native file system for that platform. The native OS for the machine I am using for the port is AIX. In order to support the local file system, two things must be done: 1) A JFS kernel file system based on VFS must be written 2) Boot code that can load the kernel from a JFS partition must be written. It would aid me greatly in writing the JFS file system if the VFS interface matched its documentation. This is a primary concern for all VFS based file system authors. It would aid me greatly if it were possible to share code between the boot and the kernel file systems. This is a secondary concern, since I can safely use the PReP psecification for pre-Open Firmware boot code and use the IBM boot code. It is not practical to use the IBM code for a distribution, however, since IBM owns it and it requires an AIX license. A distribution version of the port will likely not use JFS as its native file system, also lessoning the secondary concern. To resolve my primary concern, I would like to modify the VFS interface code in the kernel (all code that makes VFS calls) to conform to the design criteria for the 4.4BSD-Lite stackable file system interface. Note that this concern is an issue for anyone attempting to use the 4.4BSD-Lite stackable file system interface for platform ports, and for people attempting to use the code in non-4.4BSD-Lite derived systems. Like Linux and Windows95. This interface is documented in John Heidemann's Master's thesis, which can be obtained by interested persons from ftp.cs.ucla.edu. The FICUS file system documentation available from the same site is also applicable. It is clear that Chris Torek's importation of the Heidemann code for the 4.4BSD-Lite release was a quick hack job. Both Garrett Wollman, and later myself, have made patches to Chris Torek's code to cause it to be more in line with John Heidemann's intent. Proposal: ---------------------------------------------------------------------- Here are the layering/framework issues I would like to address initially: 1) The vfs_opv_numops global is a count of the total number of per interface vop descriptors in the vfs_op_descs[] array in the machine generated file vnode_if.c. The effect of having the count occur in the vfs_init.c code instead of as a static initialization is to make the framework depend on having at least one static entry. --- I would like to replace the count mechanism with a static initialization mechanism based on the machine generation mechanism. Specifically, I'd like vnode_if.sh to add the following line: int vfs_opv_numops = sizeof(vfs_op_descs)/sizeof(struct vnodeop_desc *) - 1; to the end of vnode_if.c, and an extern reference to vnode_if.h, Function will be identical before and after the patches. The intent of this change is to allow the operating system to act correctly when going for 0->1 static file system definitions. This is both of obvious utility to cross platform porters in not having to get a file system operational as a minimal porting requirement (allowing the port to be staged), and brings the file system closer into line with the Heidemann document. This change currently exists as part of my "fs layering patches" in ~terry on freefall.cdrom.com. 2) The interface between the VFS framework and the file system instances incorrectly makes assumptions that the design document does not allow it to make. a) The underlying file system is assumed to free the path name buffer created by the file name lookup mechanism in vfs_lookup.c, This assumption constitutes common, undocumented state which must be reimplemented in each underlying file system. There are currently several coding errors in the underlying file systems that will cause failures in rare circumstances. These are a direct result of incorrect reimplementation of the assumed state manipulations regarding the path name buffer. --- I would like to remove the assumption of a BSD dependent lookup mechanism. I would accomplish this by moving the cn_pnbuf free into an inverse routine called "nameifree" in vfs_lookup.c. Function will be identical before and after the patches. This would simultaneously simplify the task of writing a new file system (by removing the state assumptions), clean up the state errors in the failure mode cases, and lessen the dependency on a BSD based path component name lookup mechanism, making the code more portable to non-BSD environments using dissimilar lookup mechanisms (like Linux). Note: this would require concommitant changes to the locations where namei() is called and asked to preserve the path so that such implied state could be easily backed out. The major offender in this case is vfs_syscalls.c, Note: this would require changes to the nfs_subs.s and dependent modules, since nfs implements its own lookup mechanism (called nfs_namei), and so will need a parallel backoff mechanism (nfs_nameifree). All of these changes exist as part of my "fs layering patches" in ~terry on freefall.cdrom.com. b) The underlying file system makes calls back to the VFS implementation layer to implement advisory file locking. This is in violation of the file system, layering documentation. It also poses problems for the support of NFS locking, and constitutes an undue complication for a file system author (or OS porter acting as a native file system author). --- I would like to move the file system advisory locking to make use of a two stage lock commit process. The logical explanation of this process is as follows: i) A lock assertion is done locally at the VFS and system call interface layer. ii) A lock assertion is made to the underlying file system(s). iii)(1) If the lock is not "vetoed" by the underlying file system, the assertion is allowed. iii)(2) If the lock is vetoed by the underlying file system, the local lock assertion is reverted locally and the assertion is disallowed. For lock deassertion, the order of local/underlying is reversed. Function will be identical before and after the patches. This has the net effect of simplfying the advisory locking code in the per file system implementation, since most file systems will be using the local locking pardigm. It also moves the POSIX semantics into the OS, where they belong. The resulting advisory lock structures are to be hung from the vnode instead of the per file system inode, further generalizing their applicability. Finally, it simplifies the case of a stacked or union file system mount, since underlying cases result in one or more calls, the failure of which causes a veto to be returned to the local call layer. These change are "code in developement". c) The underlying file system makes assumptions about the in core inode structure. File locks are made on a per file system basis. This makes them subject to the same issues as advisory locking with regard to portability, code reuse, and file system authoring as advisory locks. --- I would like to move the file system file locking to make use of a similar multistage commit process as that described for the advisory locking. The reasoning is identical. The resulting file lock structures are to be hung from the vnode instead of the per file system inode, further generalizing their applicability. Function will be identical before and after the patches. These change are "code in developement". d) The underlying file system is expected to imply retention of the path name buffer when doing a lookup for a rename/mkdir/create operation. This is an assumption of state. --- I would like to move the assumption of state out of the per file system implementation case to simplify the implementation of additional file systems during OS porting, or otherwise. This will have the same effect with regard to bringing the file system framwork into compliance with the Heidemann documentation. Function will be identical before and after the patches. These change are "code in developement". I would like to invite discussion on these proposed VFS changes. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.