Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 12 Jul 95 23:52:19 MDT
From:      terry@cs.weber.edu (Terry Lambert)
To:        faulkner@mpd.tandem.com (Boyd Faulkner)
Cc:        questions@freebsd.org
Subject:   Re: Linux FS - ignorant question
Message-ID:  <9507130552.AA22896@cs.weber.edu>
In-Reply-To: <9507121908.AA20188@olympus> from "Boyd Faulkner" at Jul 12, 95 02:08:16 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> Terry,
> I have watched many notes on Linux FS support under FreeBSD and have always
> had one question.  Much has been made of how the Linux FS is cavalier about
> its metadata and such, but why isn't the real question start and endpoint.
> 
> If I am running FreeBSD and I mount a Linux FS in Good Shape (TM) and 
> I fiddle with it and write to it and when I unmount it, I leave it in
> Good Shape (TM), who cares how I do it in the middle.  So it is not as
> fast!  Perhaps it is more robust.
> 
> What am I missing?

The fact that a graceful shutdown is not the only possible shutdown.

Totally aside from the fact that POSIX compliance require a very specific
set of file system events that *MUST* occur and another set of very
specific events the *MAY* occur and a final set of events that *MUST
NOT* occur, there's the issue of fragility.

In a non-graceful shutdown, the Linux file system is more fragile than
the Berkeley FFS.  Specifically, directory operation in progress in
an indeterminate state can be broken.

Some of this brokenness is hidden by the log-structuring of the file
system implementation; it's one of the reasons that I dislike many
aspects of POSIX with regard to file systems: there are very rigid
implementation requirements... not a lot of thought went into the
concept of advanced file systems when they wrote the thing.

In general, file system events can be seperated into idempotent
and non-idempotent events.  For consistency purposes, another
seperation of events into transactions based on time-based
atomicity is necessary.  For instance, many of the operational
requirements in POSIX (and in NFS server caching constraints)
are the result of attempts to guarantee atomicity of directory
entry manipulation requests as opposed to file metadata operations
as opposed to file data operations.  One thing that people often
don't think about is that data and metadata are file attributes,
while a file's name is *not* an atribute of the file itself (the
NetWare file system, for instance, keeps the file name *as* an
attribute of the file).  I would be just as happy with some kind
of implementation that seperated meta-data storage into file system
event realted data and file attribute data.

Where Linux falls down in the EXT2FS (which is actually quite good
for a second rev of entirely from scratch code) is in the coupling
of file system events that are by their nature compsed of several
transactions, the sequence of which should result in atomic updates
of metadata for reliability (if you, for instance, update the block
allocation bitmap but fail to update a file's disk block list in
it's inode, you can spend a long time looking for which blocks are
free if you crash right at that time.  Or if you write a block of
data and write the block list in the inode metadata, but fail to
update the disk with the metadata before you update the data block
itself, the file is in an inconsistent (and potentially unrecoverable)
state.

Starting from scratch once again, I'd probably pre-define all of my
file system events and actually make the file system event procesing;
this would allow you to hook callbacks, for instance, directory content
change notification, to the events.  Consider most GUI browsers must
currently stat the directories they present to the user at given (short)
intervals to maintain an accurate representation of the contents.  If
you are running an Appletalk file server, the Macingtosh clients poll
requesting volume modification date information at 11 second intervals;
since no such per-filesystem information is kept, and the events related
to volume changes can't be trapped, the UNIX system must lie to the
Macintosh client and say that the volume has changed (and thus every
11 seconds, every Mac client on your network iterates every directory
for which a window is open on the Finder desktop).

Anyway, suffice it to say that yes, Linux is cavalier, no, they aren't
POSIX compliant, no, that doesn't matter much, yes, the lack of
synchronus updates for some file system events is not covered by
the saving grace of being log structures, while some are covered
by it.



BTW: How many of you knew that an MSDOS file system that was mounted
read-only can be made POSIX compliant if implemented correctly, but
one mounted read-write can never be POSIX compliant?  8-).


					Terry Lambert
					terry@cs.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9507130552.AA22896>