From owner-freebsd-hackers Tue Mar 25 04:53:03 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id EAA10060 for hackers-outgoing; Tue, 25 Mar 1997 04:53:03 -0800 (PST) Received: from eac.iafrica.com (196-31-98-19.iafrica.com [196.31.98.19]) by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id EAA10055 for ; Tue, 25 Mar 1997 04:52:55 -0800 (PST) Received: (from rnordier@localhost) by eac.iafrica.com (8.8.5/8.6.12) id OAA18965; Tue, 25 Mar 1997 14:33:09 +0200 (SAT) From: Robert Nordier Message-Id: <199703251233.OAA18965@eac.iafrica.com> Subject: Re: dump for MS-DOS partitions. In-Reply-To: <199703242324.QAA23896@phaeton.artisoft.com> from Terry Lambert at "Mar 24, 97 04:24:04 pm" To: terry@lambert.org (Terry Lambert) Date: Tue, 25 Mar 1997 14:33:08 +0200 (SAT) Cc: hackers@freebsd.org, port-i386@netbsd.org X-Mailer: ELM [version 2.4ME+ PL31 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk [Cc list trimmed.] Terry Lambert wrote: > A fsck is relatively trivial. > > That's because there is no difference between a directory entry and > a physical inode in the MSDOSFS... many of the checks performed by > the FFS fsck are simply not applicable to the idea of checking an > MSDOSFS. That a fsck-like utility for FAT/VFAT is relatively trivial, feasible, or even desirable, is a dangerous illusion. :-) What makes fsck itself possible is that the FFS was modified to make recovery (by fsck) a deterministic process. If processing is interrupted, fsck needs only enough smarts to know what the FFS was busy with, and therefore what must be done, or undone. A true fsck doesn't need to `know' the filesystem *as data*. But it needs a near perfect knowledge of the filesystem *as code*. Fsck doesn't really look for broken data structures and repair them, it identifies interrupted updates and completes them (rolling them back or forward). A fsck needs to be paired with a particular FS implementation, because it is (logically) an integral part of a *specific* FS implementation. With the DOS FS(es), the situation is too different. Even if the dozen or so DOS (or DOS FS) implementations all did metadata updates ordered the same way, these good intentions would still potentially be perverted by caching software/subsystems that don't provide (or are not configured for) `write through' operation. In addition, the DOS FS lacks a `clean' flag, so FS repair is not forced after a crash. By the time FS repair *is* attempted, there may have been multiple interrupted updates, undetected, each of which left FS inconsistencies, which then interacted to produce further inconsistencies.... Another problem is that a bug in any application can unintentionally modify the DOS filesystem code itself, or corrupt system tables. So however perfect the DOS FS implementation may be, its correct operation can't be assumed. Any kind of deterministic fsck for the DOS FS is therefore a pipe dream (except if only the BSD DOSFS implementation is ever allowed to update the filesystem ... not a realistic restriction, given why anyone is likely to be using a DOS FS in the first place). A DOS FS repair utility has to be heuristic. But to represent such a utility as fsck-like, makes false claims. A heuristic utility functions completely differently; and a heuristic utility hasn't a remotely comparable chances of success. Fsck also provides a very bad model for what a heuristic file repair utility should be like. When something has to be done, fsck knows what it is doing: so it needs a minimum of interaction with the user. To be of fsck standard, a sensible DOS FS repair utility really needs to be either: o A `smart' interactive filesystem debugger (which is, not coincidentally, why the Norton Utilities and PC-Tools were so successful on DOS) o A utility of a goal-seeking AI-type (not unlike a chess program) which can run a million `what if' scenarios before deciding, in the case of a cross-linked cluster, for example, which link to preserve. > > The biggest concerns of chkdsk are: > > o Clusters referenced by more than one file > o Clusters that appear to be refernced, but aren't > > In the first case, the cluster chais are typically duplicated and > unreferenced by the second file, makeing one of the files "whole" > and the other "corrupt" (by definition, the situation can not arise > in normal operation). Where one or more directories link to the same cluster, it may be impossible to resolve the situation sensibly. Asking the user only puts him in a maze of twisty little decision paths, all different; an arbitrary decision risks destroying nearly 100% of the filesystem; and an exhaustive, recursive analysis of the consequences is likely to take longer than the user (and/or the universe) is prepared to wait. > In the second case, it asks "convert cluster chains to files?", and > makes files to contain the chains. This, also, can never happen > during normal operation. If directories are involved, this can also totally scramble the filesystem. What I think the DOS FS needs is a sort of `lint'. I've been working on something that even offers optional advice like ``Warning: cross- linked directories exist: don't even think of running scandisk''. :-) Being lint-like, it only finds problems, it doesn't fix them. But writing a heuristic DOS FS fixing utility is probably the equivalent of writing a program to play a good chess endgame (ie. win or draw with three or four pieces on each side). AI hasn't solved the chess thing, and (after far too much time spent analyzing the DOS FS problem), I believe that doing a decent (theoretically satisfying) implementation would be a thankless waste of time and effort. -- Robert Nordier