From owner-freebsd-hackers  Tue Mar 25 04:53:03 1997
Return-Path: <owner-hackers>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.5/8.8.5) id EAA10060
          for hackers-outgoing; Tue, 25 Mar 1997 04:53:03 -0800 (PST)
Received: from eac.iafrica.com (196-31-98-19.iafrica.com [196.31.98.19])
          by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id EAA10055
          for <hackers@freebsd.org>; Tue, 25 Mar 1997 04:52:55 -0800 (PST)
Received: (from rnordier@localhost) by eac.iafrica.com (8.8.5/8.6.12) id OAA18965; Tue, 25 Mar 1997 14:33:09 +0200 (SAT)
From: Robert Nordier <rnordier@iafrica.com>
Message-Id: <199703251233.OAA18965@eac.iafrica.com>
Subject: Re: dump for MS-DOS partitions.
In-Reply-To: <199703242324.QAA23896@phaeton.artisoft.com> from Terry Lambert at "Mar 24, 97 04:24:04 pm"
To: terry@lambert.org (Terry Lambert)
Date: Tue, 25 Mar 1997 14:33:08 +0200 (SAT)
Cc: hackers@freebsd.org, port-i386@netbsd.org
X-Mailer: ELM [version 2.4ME+ PL31 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

[Cc list trimmed.]

Terry Lambert wrote:

> A fsck is relatively trivial.
> 
> That's because there is no difference between a directory entry and
> a physical inode in the MSDOSFS... many of the checks performed by
> the FFS fsck are simply not applicable to the idea of checking an
> MSDOSFS.

That a fsck-like utility for FAT/VFAT is relatively trivial, feasible,
or even desirable, is a dangerous illusion. :-)

What makes fsck itself possible is that the FFS was modified to make
recovery (by fsck) a deterministic process.  If processing is
interrupted, fsck needs only enough smarts to know what the FFS was
busy with, and therefore what must be done, or undone.

A true fsck doesn't need to `know' the filesystem *as data*.  But it
needs a near perfect knowledge of the filesystem *as code*.  Fsck
doesn't really look for broken data structures and repair them, it
identifies interrupted updates and completes them (rolling them back or
forward).

A fsck needs to be paired with a particular FS implementation,
because it is (logically) an integral part of a *specific* FS
implementation.

With the DOS FS(es), the situation is too different.

Even if the dozen or so DOS (or DOS FS) implementations all did
metadata updates ordered the same way, these good intentions
would still potentially be perverted by caching software/subsystems
that don't provide (or are not configured for) `write through'
operation.

In addition, the DOS FS lacks a `clean' flag, so FS repair is not
forced after a crash.  By the time FS repair *is* attempted, there
may have been multiple interrupted updates, undetected, each of which
left FS inconsistencies, which then interacted to produce further
inconsistencies....

Another problem is that a bug in any application can unintentionally
modify the DOS filesystem code itself, or corrupt system tables.  So
however perfect the DOS FS implementation may be, its correct operation
can't be assumed.

Any kind of deterministic fsck for the DOS FS is therefore a pipe dream
(except if only the BSD DOSFS implementation is ever allowed to update
the filesystem ... not a realistic restriction, given why anyone is
likely to be using a DOS FS in the first place).

A DOS FS repair utility has to be heuristic.  But to represent such a
utility as fsck-like, makes false claims.  A heuristic utility
functions completely differently; and a heuristic utility hasn't a
remotely comparable chances of success.

Fsck also provides a very bad model for what a heuristic file repair
utility should be like.  When something has to be done, fsck knows
what it is doing: so it needs a minimum of interaction with the user.

To be of fsck standard, a sensible DOS FS repair utility really needs
to be either:

   o A `smart' interactive filesystem debugger (which is, not
     coincidentally, why the Norton Utilities and PC-Tools were so
     successful on DOS)

   o A utility of a goal-seeking AI-type (not unlike a chess program)
     which can run a million `what if' scenarios before deciding,
     in the case of a cross-linked cluster, for example, which link
     to preserve.

> 
> The biggest concerns of chkdsk are:
> 
> o	Clusters referenced by more than one file
 
> o	Clusters that appear to be refernced, but aren't
> 
> In the first case, the cluster chais are typically duplicated and
> unreferenced by the second file, makeing one of the files "whole"
> and the other "corrupt" (by definition, the situation can not arise
> in normal operation).

Where one or more directories link to the same cluster, it may be
impossible to resolve the situation sensibly.

Asking the user only puts him in a maze of twisty little decision
paths, all different; an arbitrary decision risks destroying
nearly 100% of the filesystem; and an exhaustive, recursive
analysis of the consequences is likely to take longer than the user
(and/or the universe) is prepared to wait.

> In the second case, it asks "convert cluster chains to files?", and
> makes files to contain the chains.  This, also, can never happen
> during normal operation.

If directories are involved, this can also totally scramble the
filesystem.

What I think the DOS FS needs is a sort of `lint'.  I've been working
on something that even offers optional advice like ``Warning: cross-
linked directories exist: don't even think of running scandisk''. :-)
Being lint-like, it only finds problems, it doesn't fix them.

But writing a heuristic DOS FS fixing utility is probably the equivalent
of writing a program to play a good chess endgame (ie. win or draw with
three or four pieces on each side).  AI hasn't solved the chess thing,
and (after far too much time spent analyzing the DOS FS problem), I
believe that doing a decent (theoretically satisfying) implementation
would be a thankless waste of time and effort.

-- 
Robert Nordier