From owner-freebsd-hackers  Fri Sep 27 14:27:15 1996
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id OAA03135
          for hackers-outgoing; Fri, 27 Sep 1996 14:27:15 -0700 (PDT)
Received: from phaeton.artisoft.com ([198.17.250.211])
          by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id OAA03096
          for <FreeBSD-hackers@freebsd.org>; Fri, 27 Sep 1996 14:27:09 -0700 (PDT)
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id OAA10412; Fri, 27 Sep 1996 14:24:08 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199609272124.OAA10412@phaeton.artisoft.com>
Subject: Re: cvs commit: src/sbin/fsdb fsdb.c
To: guido@gvr.win.tue.nl (Guido van Rooij)
Date: Fri, 27 Sep 1996 14:24:07 -0700 (MST)
Cc: pst@shockwave.com, FreeBSD-hackers@freebsd.org
In-Reply-To: <199609271904.VAA01907@gvr.win.tue.nl> from "Guido van Rooij" at Sep 27, 96 09:04:07 pm
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

> The strange thing is that this should be impossible to happen. Anyway,
> the problem is that sometimes an filesystem passes the fsck but still makes
> the kernel panic with a bad dir: mangled entry (or something like that).
> The reason is that the size of the directory is beyond the last datablock,
> thus effectively making a sparse directory file (at least in my case).
> Fsck doesn't find anything becuase it only examines the present datablocks.
> The kernel does see such a non-present block as a bunch of zero's. And
> that causes the panic because a non-used directory chunk should have a
> reclen field of 255. The fix (until fsck is fixed) is to fsdb the filesystem,
> chdir to the bad dir and do an ls. You will then see the last entry and you
> can reset the size of the directory untill just after that entry.

This FS *was* fsck'ed after a crash, or it *wasn't* fsck'ed after a
crash?

If it *wasn't*, then the loop was created in the FS code.

If it *was*, then the fsck code is faulty.

I have already fixed one fault in the lost+found creation handling
(root inode link count).  If a crash occured after a directory entry
removal, but prior to the VOP_TRUNCATE, the FS would appear to be in a
consistent state.

Such a crash should not mark the FS clean.

The correct mechanism for recovery would be for the fck to travers the
last directory block in a directory to make sure it has at least one
valid entry, and perform a full traversal with a file truncation if
otherwise, to complete the directory "shrink".


Since the lost+found and the truncate back were the two major fsck
impactful semantic changes (all other operations *should* be idempotent),
then this should be the last one lurking in the "4.4 semantic changes"
for fsck.


So: can you tell me if the condition resulted from fsck not catching
it after a crash, or if it resulted from normal operation of the FS?


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.