Date: Mon, 17 Feb 1997 19:44:08 -0500 (EST) From: Thomas David Rivers <ponds!rivers@dg-rtp.dg.com> To: ponds!freefall.cdrom.com!freebsd-hackers, ponds!uriah.heep.sax.de!joerg_wunsch, ponds!McKusick.COM!mckusick, ponds!lakes.water.net!rivers Cc: ponds!root.com!dg Subject: Re: dup alloc panic Message-ID: <199702180044.TAA24862@lakes.water.net>
next in thread | raw e-mail | index | archive | help
Well - Here's the latest info on what I've discovered. It's very strange, and quite suspect. I'm sending this to solicit ideas... First, I've discovered that if I zero out the data on the disk where this troublesome inode occurs - everything works just fine. I did this by newfs'ing the file system; then using clri to zero out the inode in question. Also, once the inode is zero'd out; you have to corrupt it. That is, it doesn't appear that newfs is the culprit in getting the bad data there to begin with... it could just be "stuff" left on your disk. So, I've written a trashing version of clri that fills the inode with 0xff instead of 0x00. Also, I've had it print out block offsets so I can add printf's to newfs and ensure it is writing zero to that block. Here comes the weird part. If I trash the inode; and run newfs - I can check again (either using my version of clri that prints values, or just using fsck to check the new file system) and the file system is still hosed. Newfs believes it wrote 0x0's at that block (I've verified that it lseek'd to that location and wrote 8192 zero bytes there, and that the return value of the lseek() and write() calls indicated success.) *But* if I trash the inode and run newfs two times in a row; the inode is correctly zero'd out. I was stunned by this discovery, and have verified it 3 times; just to ensure I wasn't mistaken. It doesn't occur this way _every_ time; just some times. This is by no means consistent... Also, very often when I run my newfs (built optimized with some printf's in it) the inode will be properly cleared; whereis the on on the boot and fixit floppies will not clear it. Then, I'll perform the test again (with no reboot, nothing, just running the commands) and my newfs won't clear the inode... although it indicates it has. It is anything but regular (and most frustrating.) This is an implication that something more low-level (maybe VM, maybe device driver, except I've seen in on IDE and SCSI hardware) is going on here; and that the blocks simply are, sometimes, making it to the disk. This is a *highly suspect* hypothesis - especially since every other block in-the-world seems to make it on the disk... (see, I told you it was weird.) I've been playing with it for two days now, and can't figure it out. Some steps I've taken include adding fsync() calls to wtfs() in mkfs.c, running sync, etc... Here's the steps I go through: 1) Trash inode # 7680. 2) newfs the file system 3) fsck the new file system; fsck reports the bad inode and clears it out 4) Trash inode # 7680 5) newfs the file system 6) newfs the file system again 7) fsck the new file system - no problems reported. [Or, replace steps 5/6 with "run the boot/fixit floppy newfs" ] With that in mind; I can say that I've only machines that have experienced this problem are 386s. I don't have the where-with-all to put together a 486/586 that I can trash in this manner. Also, this doesn't seem to jive with Joerg's similar problem when newfs's MFS file systems. [Joerg - was that even a 386 machine?] At this point; I'm up for suggestions... - Dave Rivers -
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199702180044.TAA24862>