From owner-freebsd-hackers Tue Feb 25 19:20:56 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id TAA09732 for hackers-outgoing; Tue, 25 Feb 1997 19:20:56 -0800 (PST) Received: from dg-rtp.dg.com (dg-rtp.rtp.dg.com [128.222.1.2]) by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id TAA09727 for ; Tue, 25 Feb 1997 19:20:51 -0800 (PST) Received: by dg-rtp.dg.com (5.4R3.10/dg-rtp-v02) id AA09918; Tue, 25 Feb 1997 22:20:08 -0500 Received: from ponds by dg-rtp.dg.com.rtp.dg.com; Tue, 25 Feb 1997 22:20 EST Received: from lakes.water.net (lakes [10.0.0.3]) by ponds.water.net (8.8.3/8.7.3) with ESMTP id VAA19672 for ; Tue, 25 Feb 1997 21:43:32 -0500 (EST) Received: (from rivers@localhost) by lakes.water.net (8.8.3/8.6.9) id VAA19324 for freebsd-hackers@freefall.cdrom.com; Tue, 25 Feb 1997 21:48:47 -0500 (EST) Date: Tue, 25 Feb 1997 21:48:47 -0500 (EST) From: Thomas David Rivers Message-Id: <199702260248.VAA19324@lakes.water.net> To: ponds!freefall.cdrom.com!freebsd-hackers Subject: Re: More on bad dir panics Content-Type: text Sender: owner-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > > > I have been trying to look around the crash dumps, as they are plentiful > these days (twice a day seems to be the current rate). These always happen > at the same point and all crashes are similar, crash occurs on directory > lookup stombling over a block which contains something else than directory > data. This "smells" very similar to my problems... perhaps we can devine the intersection of these two problems and hit on a solution? Things I've determined: o) This can happen in a very light load. o) It happens on several types of hardware (SCSI, IDE, 386-586.) The problem appears to be related to inode allocation - in that an inode is marked in the free inodes array as "available" (the bit isn't set) and then, some other later code reads the data from the disk and checks a field (for the "dup alloc" panic, it's the "mode" field) and discovers that "oops - it, in fact was being used." Does that sound familiar? Some other interesting observations: o) This can happen with a brand-new file system; if you write trash the device, then do a newfs. Newfs believes it has correctly filled in all the inodes with 0, but some (at least one in my tests) aren't correctly zero'd. o) The problem "strikes" and gets progressively worse until the file system simply falls apart. I'm up to twice a day myself on my news server; also, a find in /usr/spool/news now produces a lot of "Bad file descriptor" messages, indicating other file system problems that fsck didn't correct. o) Running fsck once isn't enough to restore a file system to a semi-usuable state; if you fsck it once, try again, you'll sometimes notice more corrections. o) This isn't "new" - it's something I've experience in all 2.1 releases (although, until now, I was about the sole reporter of the problem.) I mention this to try and narrow the scope of what we're looking for. It was something that happened in the 2.1.0 time-frame. - Dave Rivers -