Date: Mon, 20 Nov 2000 09:01:03 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: dwalton@acm.org Cc: julian@elischer.org (Julian Elischer), fs@FreeBSD.ORG Subject: Re: corrupted filesystem Message-ID: <200011200901.CAA04577@usr06.primenet.com> In-Reply-To: <3A17FFA9.30580.3B5C5FD@localhost> from "Dave Walton" at Nov 19, 2000 04:28:25 PM
next in thread | previous in thread | raw e-mail | index | archive | help
> > there is a file/dir bit in the directory as well > > obviously they disagree.. > > Ah. Now it makes sense. > > > hopefully if you can run fsck -y you can find the inodes of the > > user's individual directories when they are put in lost+found > > Isn't it possible for fsck -y to cause more damage while it tries to > repair things? It will probably delete everything as "unreferenced". It's pretty clear that your /usr directory got hosed, which probably means "/" on the FS being mounted as /usr. This will have happened because the directory entry block that was at the top level was undergoing modification at the time of the crash. The most likely reason for this was a create or a rename of a file or directory in /usr, resulting in a compaction of the directory entry block containing the damaged entries. You could "fix" this by dumping the contents of the top level directory on the device, and sifting through it by hand with a copy of /sys/ufs/ufs/dir.h in hand. "Fixed", you will have added back references to the inodes of the directories (and perhaps files, if there were any) under /usr. See the comments in the dir.h file referenced above for details on the layout of the inodes and what makes an inode "deleted" or still alive. If any of your entries are the first entry in a directory block, you are probably screwed for that entry, since the way a directory entry is deleted from the front of a block is to zero its inode number. The one exception to this will be the first one, since that entry will be for "." (and the next for ".."). As the root of a filesystem, we know that the inode there will be "2". Adjusting the d_type for the home directory should be easy. If the problem is the type on the inode itself, you are in much worse trouble, since this will mean that the inode that had the home directory in it has been subsequently reqused for a normal file, and the data is probably destroyed. This can also be recovered, with difficulty (you will probably need to write a specialized tool for this, unless you are willing to live with everything being place in lost+found). My suggestion would be to do an image backup of the FS to tape (preferrable, twice, just in case; that a hell of a lot of data) _NOW_, and that you do it _BEFORE_ you do anything else. Realize that your previous attempts at recovery, if an automatic recovery was attempted, may have damaged your data by clearing inodes that should not have been cleared. If during your manual fsck, you overrode the "no", you could likewise have damaged the data. As a general rule, fsck exists not to recover data, but to return the FS to a consistant state. It will likely recover all that it can to lost+found, but that may not be everything. What it does recover to lost+found will be inode number named directories and files. Since directories are recovered first, this should mean that subhierarchies underneath will remain intact, and all you will lose is the names of the directories. But beware: with a corrupt root inode, all bets are off: fsck will not be able to recover, since if / is not a directory, then it will not be able to create /lost+found within the FS. So your first order of business _MUST_ be to recover as much of your / on that device as possible, so that what isn't recovered by fixing that will be recoverable into a /lost+found which can be created by the fsck process. Generally, when I run ito this type of failure, I sit down on a different system and write some tools specific to the recovery task at hand; this can be a cluster-grep, a finder-of-directory-inodes (with non-zero reference counts), a simple raw directory editor, etc.: whatever is needed for the specific task. PS: Your job is going to be much harder, since block devices were murdered, so unless your system predated the murder, you will have to be very careful to only read and write in disk block sized units on disk block boundaries. For almost all disks, this will be 512b chunks on 512b boundaries. Emergency recovery was much easier before block devices were shot in the head. 8-(. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200011200901.CAA04577>