Date: Wed, 12 Jul 95 23:52:19 MDT From: terry@cs.weber.edu (Terry Lambert) To: faulkner@mpd.tandem.com (Boyd Faulkner) Cc: questions@freebsd.org Subject: Re: Linux FS - ignorant question Message-ID: <9507130552.AA22896@cs.weber.edu> In-Reply-To: <9507121908.AA20188@olympus> from "Boyd Faulkner" at Jul 12, 95 02:08:16 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> Terry, > I have watched many notes on Linux FS support under FreeBSD and have always > had one question. Much has been made of how the Linux FS is cavalier about > its metadata and such, but why isn't the real question start and endpoint. > > If I am running FreeBSD and I mount a Linux FS in Good Shape (TM) and > I fiddle with it and write to it and when I unmount it, I leave it in > Good Shape (TM), who cares how I do it in the middle. So it is not as > fast! Perhaps it is more robust. > > What am I missing? The fact that a graceful shutdown is not the only possible shutdown. Totally aside from the fact that POSIX compliance require a very specific set of file system events that *MUST* occur and another set of very specific events the *MAY* occur and a final set of events that *MUST NOT* occur, there's the issue of fragility. In a non-graceful shutdown, the Linux file system is more fragile than the Berkeley FFS. Specifically, directory operation in progress in an indeterminate state can be broken. Some of this brokenness is hidden by the log-structuring of the file system implementation; it's one of the reasons that I dislike many aspects of POSIX with regard to file systems: there are very rigid implementation requirements... not a lot of thought went into the concept of advanced file systems when they wrote the thing. In general, file system events can be seperated into idempotent and non-idempotent events. For consistency purposes, another seperation of events into transactions based on time-based atomicity is necessary. For instance, many of the operational requirements in POSIX (and in NFS server caching constraints) are the result of attempts to guarantee atomicity of directory entry manipulation requests as opposed to file metadata operations as opposed to file data operations. One thing that people often don't think about is that data and metadata are file attributes, while a file's name is *not* an atribute of the file itself (the NetWare file system, for instance, keeps the file name *as* an attribute of the file). I would be just as happy with some kind of implementation that seperated meta-data storage into file system event realted data and file attribute data. Where Linux falls down in the EXT2FS (which is actually quite good for a second rev of entirely from scratch code) is in the coupling of file system events that are by their nature compsed of several transactions, the sequence of which should result in atomic updates of metadata for reliability (if you, for instance, update the block allocation bitmap but fail to update a file's disk block list in it's inode, you can spend a long time looking for which blocks are free if you crash right at that time. Or if you write a block of data and write the block list in the inode metadata, but fail to update the disk with the metadata before you update the data block itself, the file is in an inconsistent (and potentially unrecoverable) state. Starting from scratch once again, I'd probably pre-define all of my file system events and actually make the file system event procesing; this would allow you to hook callbacks, for instance, directory content change notification, to the events. Consider most GUI browsers must currently stat the directories they present to the user at given (short) intervals to maintain an accurate representation of the contents. If you are running an Appletalk file server, the Macingtosh clients poll requesting volume modification date information at 11 second intervals; since no such per-filesystem information is kept, and the events related to volume changes can't be trapped, the UNIX system must lie to the Macintosh client and say that the volume has changed (and thus every 11 seconds, every Mac client on your network iterates every directory for which a window is open on the Finder desktop). Anyway, suffice it to say that yes, Linux is cavalier, no, they aren't POSIX compliant, no, that doesn't matter much, yes, the lack of synchronus updates for some file system events is not covered by the saving grace of being log structures, while some are covered by it. BTW: How many of you knew that an MSDOS file system that was mounted read-only can be made POSIX compliant if implemented correctly, but one mounted read-write can never be POSIX compliant? 8-). Terry Lambert terry@cs.weber.edu --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9507130552.AA22896>