Date: Wed, 18 Feb 2004 01:29:33 +0100 From: Matthias Andree <matthias.andree@gmx.de> To: freebsd-stable@FreeBSD.org Subject: Re: ahc and massive ffs+softupdates corruption Message-ID: <20040218002933.GB21639@merlin.emma.line.org> In-Reply-To: <200402172335.i1HNZB7E051322@gw.catspoiler.org> References: <m38yj15m59.fsf@merlin.emma.line.org> <200402172335.i1HNZB7E051322@gw.catspoiler.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 17 Feb 2004, Don Lewis wrote: > > This machine had a SCSI timeout problem on Friday Feb 6th and went down > > hard, suffering massive file system corruption on /var. At that time, > > the machine was running portupgrade -a. /var is using softupdates and > > uses default mount options. As said before, the drive's FWC enable was > > set to 0 in both the current and saved editions of mode page 8, and I > > wonder how such massive corruption can happen. I was under the > > impression that softupdates prevented any on-disk corruptions that > > require user intervention at fsck time. Given that the write cache was > > off, I am wondering if there are any ffs+softupdates or tagged command > > queueing bugs left (that might reorder writes - ordered tag forgotten or > > something). > > The UNKNOWN FILE TYPE complains are a pretty good clue that a block > containing inodes got overwritten by garbage. I've seen this sort of > thing happen if power to a drive fails. It could also be caused by a > driver or firmware bug that causes data to get written to the wrong > place, or a cabling or termination problem that causes the drive to see > the wrong command. Ah, that makes some sense. It's unlikely to be a termination/cabling/power problem, the machine is otherwise rock solid and has been stable after the incident, too. If there had been a serious power outage, the other machine wouldn't have been able to log properly or would have logged a reboot. I won't preclude firmware/hardware bugs, given that the drive just disappears from the bus when it is inquired too early after power up/reset - a reset-to-inquiry delay of 10 s in Tekram controllers fixed this. Adaptec's 2940 UW Pro does something different and works in default configuration. Final question for now: Does one disk block contain multiple inodes? How many maximum? -- Matthias Andree Encrypt your mail: my GnuPG key ID is 0x052E7D95
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040218002933.GB21639>