From owner-freebsd-fs Wed Jul 10 15:00:50 1996 Return-Path: owner-fs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id PAA19316 for fs-outgoing; Wed, 10 Jul 1996 15:00:50 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id PAA19296 for ; Wed, 10 Jul 1996 15:00:41 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id OAA27403; Wed, 10 Jul 1996 14:56:01 -0700 From: Terry Lambert Message-Id: <199607102156.OAA27403@phaeton.artisoft.com> Subject: Re: Fixing Union_mounts To: michaelh@cet.co.jp (Michael Hancock) Date: Wed, 10 Jul 1996 14:56:01 -0700 (MST) Cc: freebsd-fs@FreeBSD.ORG, terry@lambert.org In-Reply-To: from "Michael Hancock" at Jul 10, 96 11:26:40 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-fs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > [Please trim off current and leave fs when replying] OK. > Terry posted this reply to the "making in /usr/src" thread. I'd like to > see all this stackable fs stuff made usable. > > I have some questions on Terry's remedies items 2) and 4) below: > > 2) Moving vnode locking to the vnode from the per fs inode will fix the > help fix the stacking problems, but what will it do for future advanced > file systems that need to have special locking requirements? It will not impact them in any way. Specifically, the change is from: syscall() VOP_LOCK() return xxx_lock() return kern_lock.c lock to: syscall() if( kern_lock.c lock == SUCCESS) { if( VOP_LOCK() return xxx_lock() == FAILURE) { kern_lock.c unlock } } Which is to say that the per FS lock code gets the opportunity to veto the locking, but in the default case, will never veto. This leaves room for the complex FS's to veto at will. The same goes for advisory locking. It should be obvious how the lock veto will work for NFS client locking: if( local lock == SUCCESS) { if( remote lock == FAILURE) local unlock } This has the advantage of preventing local conflicts from being appealed over the wire (and perhaps encountering race conditions as a result). > 4) Moving the vnodes from the global pool to a per fs pool to improve > locality of reference. Won't this make it hard to manage memory? How > will efficient reclaim operations be implemented? The memory is allocable per mount instance. The problem with the recovery is in the divorce of the per FS in core inode from the per FS in core vnode, as implemented primarily by the vclean() and family of routines. Specifically, there is already a "max open" limit on the allocated inodes, in the same respect, and with the same memory fragmentation issues coming up as a result. The reclaim operation will be done by multiplexing ffs_vrele the same way ffs_vget, ffs_fhtovp, and ffs_vptofh (operations which also deal with per FS vnode-inode association) currently multiplex VFS_VGET, etc.. The net effect of a real cleanup (which will require something similar to this to be implemented, in any case) will be to actually reduce the number of cache misses -- since there are frequent cases where a vnode is recycled leaving the buffer cache contents in core. A sbsequent read failes to detect this fact, and the disk is actually read instead of a cache hit occurring. This is a relatively huge overhead, and it is unnecessary. This is only foundation work, since it requires a cleanup of the vclean/etc. interfaces in kern/vfs_subr.c. It will have *some* effect, in that an inode in the current ihash without an associated vnode (in the current implementation) will always have a recoverable vnode. This should be an immediate win for ihashget() cache hits, at least in those FS's that implement in core inode hashing (FFS/LFS/EXT2). > This stacked fs stuff is really cool. You can implement a simple undelete > in the Union layer by making whiteout entries (See the 4.4 deamon book). > This would only work for the duration of the mount unlike Novell's > persistent transactional stuff, but still very useful. Better than that. You could implement a persistent whiteout or umsdos type attribution in a file the same way, by stacking on top of the existing FS, and "swallowing" your own file to do the dirty deed. The duration would be permanent, assuming mount order is preserved. This was the initial intent of the "mount over" capability: the mount of the underlying FS would take place, then the FS would be "probed" for stacking by looking for sepcific "swallow" files to determine if tanother FS should mount the FS again on the same mount point interposing its layer. This is specifically most useful right now for implementing a "quota" layer: ripping the quota code out of UFS in particular, and applying it to any FS which has a quota file on it. 8-). > There are already crypto-fs implementation out there, but I'd like to see > more; especially non ITAR restricted ones that can be used world-wide. There is a file-compression (not block compression) FS, which two of John Heidemann's students implemented as part of a class project, as well. There is also the concept of a persistent replicated network FS with intermittent. network connectivity (basically, what the FICUS project implied) for nomadic computing and docking/undocking at geographically seperate locations (I use a floating license from the West coast office to create a "PowerPoint" presentation, fly across the country, plug in my laptop to the East coast office network, and use a floating license from the East coast office to make the actual presentation to the board). Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.