From owner-freebsd-hackers Fri Sep 27 14:27:15 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id OAA03135 for hackers-outgoing; Fri, 27 Sep 1996 14:27:15 -0700 (PDT) Received: from phaeton.artisoft.com ([198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id OAA03096 for ; Fri, 27 Sep 1996 14:27:09 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id OAA10412; Fri, 27 Sep 1996 14:24:08 -0700 From: Terry Lambert Message-Id: <199609272124.OAA10412@phaeton.artisoft.com> Subject: Re: cvs commit: src/sbin/fsdb fsdb.c To: guido@gvr.win.tue.nl (Guido van Rooij) Date: Fri, 27 Sep 1996 14:24:07 -0700 (MST) Cc: pst@shockwave.com, FreeBSD-hackers@freebsd.org In-Reply-To: <199609271904.VAA01907@gvr.win.tue.nl> from "Guido van Rooij" at Sep 27, 96 09:04:07 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > The strange thing is that this should be impossible to happen. Anyway, > the problem is that sometimes an filesystem passes the fsck but still makes > the kernel panic with a bad dir: mangled entry (or something like that). > The reason is that the size of the directory is beyond the last datablock, > thus effectively making a sparse directory file (at least in my case). > Fsck doesn't find anything becuase it only examines the present datablocks. > The kernel does see such a non-present block as a bunch of zero's. And > that causes the panic because a non-used directory chunk should have a > reclen field of 255. The fix (until fsck is fixed) is to fsdb the filesystem, > chdir to the bad dir and do an ls. You will then see the last entry and you > can reset the size of the directory untill just after that entry. This FS *was* fsck'ed after a crash, or it *wasn't* fsck'ed after a crash? If it *wasn't*, then the loop was created in the FS code. If it *was*, then the fsck code is faulty. I have already fixed one fault in the lost+found creation handling (root inode link count). If a crash occured after a directory entry removal, but prior to the VOP_TRUNCATE, the FS would appear to be in a consistent state. Such a crash should not mark the FS clean. The correct mechanism for recovery would be for the fck to travers the last directory block in a directory to make sure it has at least one valid entry, and perform a full traversal with a file truncation if otherwise, to complete the directory "shrink". Since the lost+found and the truncate back were the two major fsck impactful semantic changes (all other operations *should* be idempotent), then this should be the last one lurking in the "4.4 semantic changes" for fsck. So: can you tell me if the condition resulted from fsck not catching it after a crash, or if it resulted from normal operation of the FS? Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.