From owner-freebsd-current Tue Mar 10 00:57:50 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id AAA14270 for freebsd-current-outgoing; Tue, 10 Mar 1998 00:57:50 -0800 (PST) (envelope-from owner-freebsd-current@FreeBSD.ORG) Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id AAA14262 for ; Tue, 10 Mar 1998 00:57:48 -0800 (PST) (envelope-from tlambert@usr09.primenet.com) Received: (from daemon@localhost) by smtp02.primenet.com (8.8.8/8.8.8) id BAA17396; Tue, 10 Mar 1998 01:57:48 -0700 (MST) Received: from usr09.primenet.com(206.165.6.209) via SMTP by smtp02.primenet.com, id smtpd017375; Tue Mar 10 01:57:38 1998 Received: (from tlambert@localhost) by usr09.primenet.com (8.8.5/8.8.5) id BAA12940; Tue, 10 Mar 1998 01:57:35 -0700 (MST) From: Terry Lambert Message-Id: <199803100857.BAA12940@usr09.primenet.com> Subject: Re: vnode_pager: *** WARNING *** stale FS code in system To: mellon@pobox.com (Anatoly Vorobey) Date: Tue, 10 Mar 1998 08:57:35 +0000 (GMT) Cc: current@FreeBSD.ORG In-Reply-To: <19980309152125.08053@techunix.technion.ac.il> from "Anatoly Vorobey" at Mar 9, 98 03:21:25 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG NOTE: This post has "some of the fun stuff" in it. I love this type of thing. 8-). This probably goes a tiny way towards Nate's "FS 101". > Probably a silly question, but still: > > What about the NFS as a whole? It's both a provider and a consumer > of VFS. Does it mean it's a stackable FS? (forgetting for a moment > the networking details; e.g. export and mount a local filesystem for > proof of concept). No. It's two seperate pieces, the client and the server. John Heidemann's net proxy layer (which allows you to stack two arbitrary layers across a network by proxying the argument descriptor contents and the VOP descriptor contents) isn't either (though it's a lot more useful than NFS. 8-)). > And if it does, how does it manage to exist if you're saying a lot > of work has to be done to make stackable FSs possible? It isn't a stackable FS. > Another question: which OSes _today_ provide for stackable file systems? > Not FreeBSD, apparently not Linux or Solaris. NT has got "filesystem > filters" kind of drivers which stack above the FS (and on each other > if needed) - this is neat and often useful - but hasn't got stackable > filesystems IIRC. Well, Windows 95 has them. I and two other guys ported the Heidemann framework to it for a commercial product. If FreeBSD had all the patches I did, and undid a couple of other changes (or they were redone in the Windows 95 code), *and* FreeBSD ran ELF -- then the same loadable modules could be used by both systems. 8-). SunOS has them (if you get the DES key from John and download the code off the UCLS CS Department FTP server). You can do something similar with the IFSMGR.VXD code in Windows95; it's called a "Miniport Driver". You can't do it to the VFAT.VXD or VFAT32.VXD, unfortunately, because MicroSoft implemented the TSD in the VFAT*.VXD. A TSD is a "Type Specific Driver"; it's what recognizes the partition ID and exports the "cooked device" to the rest of Windows95. You can do the same thing under NT. The documentation will run you $75,000. > A third question: can you give a (few?) example(s?) of hypothetical > useful stackable file systems, besides NULLFS? Sure. Let me make a definition, first: Namespace Escapes A namespace escape is used by a stacking layer to implement stacking layer specific storage. Namespace escapes can be implemented one of several ways: 1) A file on the root of the FS. The file is read and/or written by the stacking layer. The file may or may not be hidden. The stacking layer may or may not protect the file against modification by users. It uses the schg and sunlnk flags to protect against modification; it uses a new flag, shide to hide the file from directory lookups. It uses the sappnd if it wants to keep an immutable log. This implementation is called a "Namespace Incursion", because it puts a special file name in the namespace, and (potentially) obstructs the user from using the name. The user can tell it's there because it denies the user the use of the same name. This is the best way to implement a namespace escape, if you plan on mounting the FS both with and without the stacking layer (example: an FS which will be accessed by more than one OS), and you only need to store data that applies to the FS as a whole. 2) A file in each directory of the FS. This is the same as #1, but each directory has a file, instead of just the root. This is the best way to implement a namespace escape, if you plan on mounting the FS both with and without the stacking layer (example: an FS which will be accessed by more than one OS), and you only need to store data that applies to individual directories or files. 3) A "root-level redirection". A stacking layer that implements a root-level redirection makes it look like all user accesses to the root of the FS do not actually occur on the root of the FS. It does this by making a subdirectory, and pretending that the subdirectory is the root, every time it gets a request for the root (for instance, "./root"). It then keeps its own files and directories in the real root. This is the best way to implement a namespace escape, if you will *always* be mounting the FS with the stacking layer stacked over top of the underlying FS, and you only need to store data that applies to the FS as a whole. 4) A "directory-level redirection". A stacking layer that implements a directory-level redirection makes it look like all user accesses to each directory in the FS, including the root, are actually occuring on subdirectories. This is the classic "files are directories" type of implementation that you see Mac and OS/2 people lamenting for all of the time. When a file is created, you create a directory instead, create a file called "data" (or "default", or any other implementation defined name), and when the user goes to access the file by name, you act as if he had asked for the "data" file in the directory with the name of the file he asked for. You can then create other "streams" (or "forks" or "extended attributes") within the file. This is the best way to implement a namespace escape, if you will *always* be mounting the FS with the stacking layer stacked over top of the underlying FS, and you need to store data that applies to individual directories or files. 5) A "partial directory-level redirection". A stacking layer that implements a partial directory- -level redirection is actually implementing a sort of combination of #4 and #2. The stacking layer creates a hidden directory, and stores its own files in the directory, usually in a subdirectory with the same name as the file the data applies to, or just a file with the same name if it only needs one extra data stream per file. The actual files are still in the same place. This is the best way to implement a namespace escape, if you plan on mounting the FS both with and without the stacking layer (example: an FS which will be accessed by more than one OS), and you only need to store a lot of different types of data that apply to individual directories or files. OK, now onto the fun stuff. 8-). What kind of stacking FS layers can you build? Well, you are pretty much limited by your imagination. Here are 10 of them that I've thought up off the top of my head (I admit, I've been thinking about this for a while now, so several of them aren't original; you may have seen them before): QUOTFS This layer implements quotas using a file on the FS root. The file is hidden from normal users of the FS using either namespace escape technique #3, or it uses technique #1 so that you can put quotas on MSDOS FS's (which need to also be mounted by DOS/Windows). UMSDOSFS This layer implements compatibility with the Linux UMSDOSFS. It works using namespace escape technique #2, because it has to be compatible with Linux. In each directory, it looks for (and creates, if not present) a hidden file named "--LINUX-.---". This file stores things about the the files in the directory it's in, like UNIX UID, UNIX GID, UNIX permissions, etc.; everything that MSDOSFS is too stupid to save for you. Even long filenames, if you didn't mount it as a VFAT/VFAT32 mount because it was a DOS 2.11 or Windows 3.11 drive. ;-). With minor modifications, this FS would allow FreeBSD to boot using a subdirectory of an MSDOSFS as its root filesystem (so that people could "try it out" without needing to repartition their DOS drives). ATALKFS This layer implements resource forks for Macintosh client machines. It works using namespace escape technique #5. The directories it creates are named ".AppleDesktop" and ".AppleDouble". It uses this technique because it makes the code "plug compatible" with netatalk. 8-). If FreeBSD's namei is patched to correctly inherit flags down, and to pass them as part of an opaque cn_pnbuf (needs the "nameifree" fixes to make it opaque), then you can use "the POSIX namespace escape" to access the forks from UNIX. Example: Open the data fork for the file bob in the current directory: bob Open the data fork for the file bob in the current directory: //ATALKFS/data/bob Open the resource fork for the file "bob" in the current directory: //ATALKFS/rsrc/bob Open the resource fork for the file "tom" in the "/tmp" directory: //ATALKFS/rsrc//tmp/bob CRYPTFS This layer implements cryptography for the underlying FS. It uses namespace escape technique #3 to maintain state, and because the FS is useless without the cryptographic layer. If the encrypted and decrypted data are not the same size, it uses namespace escape technique #4. VOP_READ/VOP_WRITE are trapped by this layer. So are file creates, deletions, and so on (events for which the cryptographic state needs to be synchronized). A good implementation will log transactions like this before performing them, in case it crashes halfway through an operation. COMPFS This layer implements file level compression for the underlying FS. It uses namespace escape technique #4 to maintain any uncompressed copies of file for file currently in use. When the COMPFS "fsck" is run, it removes non-compressed files ("forks"). This is done after a crash. A cleaner process follows closes. If the file isn't reopened after a short period of time, the uncompressed image is removed. EAFS This layer implements OS/2 extended attributesA (these are like Mac resource forks, only you can have more than one of them). It is like ATALKFS, but uses namespace escape technique #4 so that it can store multiple streams. Like ATALKFS, it would benefit from POSIX "//" based namespace selection. ACLFS This layer implements Access Control Lists (ACL's). It uses namespace escape technique #4 so it can store as many file attributes as it wants. This layer extends the VOP's with a "VOP_ACL". UNRMFS This layer implements "unrm". It uses namespace escape technique #4 so it can store as many file forks as it wants. This lets you delete the same file twice, and get back either copy because both are saved. It has a companion kernel process that can be told to go around the FS looking for deleted file older older than a set age (ie: over one month old), and "purge" (that is, *really* delete) them. This layer depends on the POSIX namespace selection to allow (1) the deleted files to be listed by a VOP_READDIR (this requires that the FS namage it's own vnodes so it can "know" if a given directory was opened via the POSIX namespace selection or not, since VOP_READDIR doesn't know the directory path), (2) to allow a user "purge" command to be built, so purges can happen under user control, and (3) to allow a user "unrm" command to be built (which simply renames from the deleted to the default namespace). FLSFS This layer implements File Level Security. It is like ATALKFS, but uses namespace escape technique 4 so that it can store multiple streams. This layer requires a session manager process. When a user attempts to access a file for which file level security is active, a message is sent to the user's session manager requesting credential information from the user for the file. The session manager can be a "pre-authentication" mechanism, where credentials are entered in using a command line tool, after login. Or it can be a "password cache" mechanism, like Windows 95 uses (this kind of defeats the purpose, but is useful for other uses of session management credentials, like an SMBFS or NCPFS). Or the session manager can be "active". An "active" session manager is the most interesting. Using it's knowledge of the console, or being built into the "screen" program, or being built into the xdm, the session manager can actually interact with the user on behalf of the FLSFS (or SMBFS, or NCPFS, etc.) to interactively ask the user for credentials. This would let you have password protection per file, even going so far as to have different passwords for different user for a file (an entire passwd file could be supported, including password aging, etc.). NSEFS This layer implements a shared namespace escape for other layers to share between them. At this point you might be wondering about a stack consisting of multiple FS's that use techniques #3, #4, or #5. You might be worried that these would tend to add up rather quickly. 8-). The answer is to implement a stacking layer that *only* does namespace escaping, and implements a new VOP called VOP_NSE. FS's which need a namespace escape can use this VOP (one of the arguments is "technique". Alternately, you can leave it up to the NSEFS stacking layer to decide by looking at the FS, and specify "I need one file" or "I need a file per directory" or "I need a file per file" or "I need multiple files per file". The possibilities are practically endless. 8-). Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message