Date: Tue, 28 Oct 1997 19:18:45 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: p.richards@elsevier.co.uk (Paul Richards) Cc: questions@FreeBSD.ORG, current@FreeBSD.ORG Subject: Re: Retrieving data from a totally hosed filesystem Message-ID: <199710281918.MAA04726@usr06.primenet.com> In-Reply-To: <57pvoqypdr.fsf@tees.elsevier.co.uk> from "Paul Richards" at Oct 28, 97 01:43:28 pm
next in thread | previous in thread | raw e-mail | index | archive | help
Why do these things come in flocks? 8-(. > This is Cc'd to current since I think there's a problem with fsck (see > below). No, there isn't (see below). > I totally trashed a partition on my hard disk a week ago ( I was > playing with bootblocks and scsi adapter setttings!) and I'd like to > try and retrieve data from, not critical but I'm curious how to go > about it since it's happened. This is a bug in the slice code, and in user accesibility of SCSI adapter settings. 8-). Only the first can be fixed in software... > Somehow I trashed the disklabel on the FreeBSD partition but > by using a combination of guesswork and memory I rebuilt one > and most of my partitions re-appeared without problem. One > however didn't. fsck said the superblock was invalid so, casting > caution to the wind I told fsck to use an alternate. Here is where you purchased your handbasket from the Infernal Transportation Authority. Unless this was your first partition (in which case, you probably blew the data on it during the writes you were doing, and there's no hope of a sane recovery without great effort), the reason you were unable to find a superblock is that you had the wrong start sector for the partition in your disklabel. At this point, you should have grovelled forward from the last successfully mounted partition's last superblock looking for the FS magic number. This would locate the first superblock, and therefore the start of the disk. You can know the real superblock from duplicates by knowing that the first superblock on an FS will have a filled in "last mounted on" string for the last place it was mounted. Duplicates won't, unless they are used for a mount and successful unmount. The message reported by fsck is of ultimate importantance. I doubt it said exactly "invalid". Generally, it complains about the magic number (corrupt or what you are pointing at is not a superblock), or about corruption (the non-variable parts of the superblock don't match the contents of the first backup). > Many coredumps of fsck later (I had to delete some inodes using > fsdb in order to get fsck to complete stage1) I had a totally > unravelled filesystem. Yes. It was corrupt as heck at this point. The problem is that fsck is a tool for doing two things: 1) In the event of a partial hardware failure, fsck returns the device to a know state so that you may back it up and discard the original device. What you had doesn't qualify, because the data was not corrupted by a hardware failure. The difference is that with a hardware failure, you can distinguish bad data from good data by virtue of hardware errors returned by the driver. 2) In event of a crash (power outage, etc.), fcsk can be used to deterministically back up exactly one failed transaction and return the FS metadata to a correct consistent state (an async mount gives you a 1 in 2^(n-1) chance of fsck guessing correctly -- a snowball's chance in hell). For what you did, fsck is not an appropriate tool to fix the damage. > fsck then tried to put all these files into lost+found but aborted > because it ran out of space in lost+found (which is why I've cc'd > this to current). > > So, now I'm curious about two things > > 1) fsck claims it will auto-expand lost+found if it needs to. This > seems to be very broken since it doesn't. I'm not sure the strategy of > building lost+found on the fly is a good one since there was no space > on this partition and it doesn't look like fsck is able to > to get enough space for the directory information. Prior to 4.4BSD, newfs reserved 8k of directory entry blocks as a "reserve". In 4.3BSD, directories could only grow, never shrink. This meant that if you created a large number of files and then removed them, the only way the directory entry blocks could be recovered was to delete and recreate the directory. This became more of a problem as things like news servers and terminfo and other things which abuse the FS directory structure as a database became more prevalent. In 4.4BSD, trailing empty directory blocks are ftruncate'd off the end of a directory. One consequence of this is that the first time you fsck, get something in lost+found, and remove it, your 8k reserve drops to one directory entry (it has to keep one block for "." and ".."). So it's usseless to pre-reserve space. Now the file names in lost+found that get created are "#<inode number>"; on average, this takes more longwords (directory entry data is 4 byte aligned and null terminated) than average file names of 7 characters or less. This means that if you have a huge number of files to recover, you will use more directory blocks in the recovery than they used in their original directory. So even though the formerly occupied directory blocks are recovered for reuse earlier in the fsck, they may not contain enough space to complete the creation of the lost+found. Luckily, you followed the rules, and kept a 10% reserve space free on your disk, right? One of the points of the reserve is to make the block allocation rapid and relatively efficient (it is, in the limit, a hash function, and Knuth's "Seminumerical Algorithms" shows hashes degrade exponentially, so you really don't want to go over an 85% fill -- a 10% reserve lets you go to 90% fill). Another reason, however (if you care nothing about how fast your system runs), is that that space may be needed by root for system recovery (like you found out) or other administrative tasks. IMO, you wre probably recovering transh (to a large extent) because of an invalid starting offset. It's possible that a full recovery could take much more than the total disk space in the FS, depending on what random data ended up in what inodes or indirect blocks. > That might not actually be the problem since the corruption is quite > serious but the lost+found directory has been created and fsck does > start to place files in it so I'm suspicious that this is the > problem (i.e. not able to get find enough space). Either lost+found > should be pre-allocated as it used to be See above... in any case, the allocation was only 8k. > or we should find a way of getting fsck to build lost+found somewhere > else. I started hacking fsck to try and do this but didn't get very > far with it, the basic idea of changing the lost+found directory path > didn't seem to work. Technically, unless root sucked up the reserve and didn't give it back, there is supposed to be enough reserve to recover a hard drive from even catastrophic hardware failure. But your corruption was worse than any expectable catastrophic hardware failure, short of crashing the directory entry blocks and most of the reserve blocks, simultaneously. BTW: root sucking up reserve and not giving it back is a pilot error; if this happened here, avoid doing this in the future... 8-(. > 2) Has anyone got any bright ideas as to how I can salvage as much of > the data from this partition as is possible. Since the actual data is > not corrupted (a dd of the partition shows all the data is still > untouched) there might be a way to extract the data from the partition > and reconstruct a filesystem in another area of the disk. Seems like > an interesting challenge to me and I was wondering if anyone had any > tools as a starting point. If nothing else, I suspect it should be > possible to get the unlinked inodes connected to a directory as fsck > should have done in lost+found and at least retrieve the data in those > files. The easiest way would be to mount it read-only, ignoring the clean bit, and copy off what you could. You may need to hack things to make this work Then you should be able to blow the reference count on the inodes you copied off to zero, which will make them go away before more lost+found allocations are necessary. You will either need to write a tool to do this, or use fsdb to clri the inodes. This space can then be used by a subsequent fsck to continue to populate lost+found. One or two large files should be enough. Under *no* circumstances should you fudge the "clean bit" on the disk to get a read/write mount to avoid the pain of doing a clri. A single allocation or timestamp update on a bogus FS could render the rest of the data permanently unrecoverable. If you fudge the clean bit as part of your hacking, you *must* fudge it back to dirty to be sure to trigger the fsck -- a read-only mount is the only kind of mount you should use on this thing! Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199710281918.MAA04726>