FreeBSD Mail Archives

Date:      Tue, 28 Oct 1997 19:18:45 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        p.richards@elsevier.co.uk (Paul Richards)
Cc:        questions@FreeBSD.ORG, current@FreeBSD.ORG
Subject:   Re: Retrieving data from a totally hosed filesystem
Message-ID:  <199710281918.MAA04726@usr06.primenet.com>
In-Reply-To: <57pvoqypdr.fsf@tees.elsevier.co.uk> from "Paul Richards" at Oct 28, 97 01:43:28 pm

Why do these things come in flocks?  8-(.


> This is Cc'd to current since I think there's a problem with fsck (see
> below).

No, there isn't (see below).


> I totally trashed a partition on my hard disk a week ago ( I was
> playing with bootblocks and scsi adapter setttings!) and I'd like to
> try and retrieve data from, not critical but I'm curious how to go
> about it since it's happened.

This is a bug in the slice code, and in user accesibility of SCSI
adapter settings.  8-).  Only the first can be fixed in software...


> Somehow I trashed the disklabel on the FreeBSD partition but 
> by using a combination of guesswork and memory I rebuilt one
> and most of my partitions re-appeared without problem. One
> however didn't. fsck said the superblock was invalid so, casting
> caution to the wind I told fsck to use an alternate.

Here is where you purchased your handbasket from the Infernal
Transportation Authority.

Unless this was your first partition (in which case, you probably
blew the data on it during the writes you were doing, and there's
no hope of a sane recovery without great effort), the reason you
were unable to find a superblock is that you had the wrong start
sector for the partition in your disklabel.  At this point, you
should have grovelled forward from the last successfully mounted
partition's last superblock looking for the FS magic number.  This
would locate the first superblock, and therefore the start of the
disk.

You can know the real superblock from duplicates by knowing that
the first superblock on an FS will have a filled in "last mounted
on" string for the last place it was mounted.  Duplicates won't,
unless they are used for a mount and successful unmount.

The message reported by fsck is of ultimate importantance.  I doubt
it said exactly "invalid".  Generally, it complains about the magic
number (corrupt or what you are pointing at is not a superblock), or
about corruption (the non-variable parts of the superblock don't
match the contents of the first backup).


> Many coredumps of fsck later (I had to delete some inodes using
> fsdb in order to get fsck to complete stage1) I had a totally
> unravelled filesystem.

Yes.  It was corrupt as heck at this point.  The problem is that
fsck is a tool for doing two things:

1)	In the event of a partial hardware failure, fsck returns
	the device to a know state so that you may back it up and
	discard the original device.  What you had doesn't qualify,
	because the data was not corrupted by a hardware failure.
	The difference is that with a hardware failure, you can
	distinguish bad data from good data by virtue of hardware
	errors returned by the driver.

2)	In event of a crash (power outage, etc.), fcsk can be used 
	to deterministically back up exactly one failed transaction
	and return the FS metadata to a correct consistent state (an
	async mount gives you a 1 in 2^(n-1) chance of fsck guessing
	correctly -- a snowball's chance in hell).

For what you did, fsck is not an appropriate tool to fix the damage.


> fsck then tried to put all these files into lost+found but aborted
> because it ran out of space in lost+found (which is why I've cc'd
> this to current).
> 
> So, now I'm curious about two things
> 
> 1) fsck claims it will auto-expand lost+found if it needs to. This
> seems to be very broken since it doesn't. I'm not sure the strategy of
> building lost+found on the fly is a good one since there was no space
> on this partition and it doesn't look like fsck is able to
> to get enough space for the directory information.

Prior to 4.4BSD, newfs reserved 8k of directory entry blocks as a
"reserve".  In 4.3BSD, directories could only grow, never shrink.
This meant that if you created a large number of files and then
removed them, the only way the directory entry blocks could be
recovered was to delete and recreate the directory.  This became
more of a problem as things like news servers and terminfo and
other things which abuse the FS directory structure as a database
became more prevalent.

In 4.4BSD, trailing empty directory blocks are ftruncate'd off the
end of a directory.  One consequence of this is that the first time
you fsck, get something in lost+found, and remove it, your 8k reserve
drops to one directory entry (it has to keep one block for "." and "..").

So it's usseless to pre-reserve space.

Now the file names in lost+found that get created are "#<inode number>";
on average, this takes more longwords (directory entry data is 4 byte
aligned and null terminated) than average file names of 7 characters or
less.  This means that if you have a huge number of files to recover,
you will use more directory blocks in the recovery than they used in
their original directory.

So even though the formerly occupied directory blocks are recovered
for reuse earlier in the fsck, they may not contain enough space to
complete the creation of the lost+found.

Luckily, you followed the rules, and kept a 10% reserve space free on
your disk, right?


One of the points of the reserve is to make the block allocation rapid
and relatively efficient (it is, in the limit, a hash function, and
Knuth's "Seminumerical Algorithms" shows hashes degrade exponentially,
so you really don't want to go over an 85% fill -- a 10% reserve lets
you go to 90% fill).

Another reason, however (if you care nothing about how fast your
system runs), is that that space may be needed by root for system
recovery (like you found out) or other administrative tasks.

IMO, you wre probably recovering transh (to a large extent) because
of an invalid starting offset.  It's possible that a full recovery
could take much more than the total disk space in the FS, depending
on what random data ended up in what inodes or indirect blocks.


> That might not actually be the problem since the corruption is quite
> serious but the lost+found directory has been created and fsck does
> start to place files in it so I'm suspicious that this is the
> problem (i.e. not able to get find enough space). Either lost+found
> should be pre-allocated as it used to be

See above... in any case, the allocation was only 8k.


> or we should find a way of getting fsck to build lost+found somewhere
> else. I started hacking fsck to try and do this but didn't get very
> far with it, the basic idea of changing the lost+found directory path
> didn't seem to work.

Technically, unless root sucked up the reserve and didn't give it back,
there is supposed to be enough reserve to recover a hard drive from
even catastrophic hardware failure.  But your corruption was worse than
any expectable catastrophic hardware failure, short of crashing the
directory entry blocks and most of the reserve blocks, simultaneously.

BTW: root sucking up reserve and not giving it back is a pilot error;
if this happened here, avoid doing this in the future...  8-(.


> 2) Has anyone got any bright ideas as to how I can salvage as much of
> the data from this partition as is possible. Since the actual data is
> not corrupted (a dd of the partition shows all the data is still
> untouched) there might be a way to extract the data from the partition
> and reconstruct a filesystem in another area of the disk. Seems like
> an interesting challenge to me and I was wondering if anyone had any
> tools as a starting point. If nothing else, I suspect it should be
> possible to get the unlinked inodes connected to a directory as fsck
> should have done in lost+found and at least retrieve the data in those
> files.

The easiest way would be to mount it read-only, ignoring the clean bit,
and copy off what you could.  You may need to hack things to make this
work

Then you should be able to blow the reference count on the inodes you
copied off to zero, which will make them go away before more lost+found
allocations are necessary.  You will either need to write a tool to
do this, or use fsdb to clri the inodes.  This space can then be used
by a subsequent fsck to continue to populate lost+found.  One or two
large files should be enough.

Under *no* circumstances should you fudge the "clean bit" on the disk
to get a read/write mount to avoid the pain of doing a clri.  A single
allocation or timestamp update on a bogus FS could render the rest of
the data permanently unrecoverable.  If you fudge the clean bit as part
of your hacking, you *must* fudge it back to dirty to be sure to trigger
the fsck -- a read-only mount is the only kind of mount you should use
on this thing!


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199710281918.MAA04726>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation