Date: Tue, 25 Aug 2009 23:46:50 -0600 From: Kelly Martin <kellymartin@gmail.com> To: FreeBSD Questions <freebsd-questions@freebsd.org> Cc: Roland Smith <rsmith@xs4all.nl> Subject: Re: hard disk failure - now what? Message-ID: <1338880b0908252246s21191e83k7c251366b706532@mail.gmail.com> In-Reply-To: <20090824223247.GD43410@slackbox.xs4all.nl> References: <1338880b0908241129p75b6845cg26d21804e118364@mail.gmail.com> <20090824223247.GD43410@slackbox.xs4all.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
First, thanks to everyone for the really great replies. Many suggestions were quite helpful and have kept me on track. I'll quote a couple of people and then add some comments below. On Mon, Aug 24, 2009 at 4:32 PM, Roland Smith<rsmith@xs4all.nl> wrote: > It _could_ just be a bad or improperly connected SATA cable. Try changing or > re-seating the cable. I thought of that too, but no luck. > Read errors cannot damage your data, but write errors can! Immediately stop > all writing to the disk. Re-mount the partitions on that disk as read-only, or > unmount them. That was a consensus among everyone who replied, so I made that step #1. I mounted the partitions read-only and crossed my fingers. Trying to check the integrity of the data, or even get directory listings was another matter, as I got various strange errors... which told me I quite likely had some data loss. > To see if a disk really is broken, install sysutils/smartmontools, and run > 'smartctl -a' on the disk. If you see errors in its report (e.g. reallocated > sectors), the disk is dying and should be unplugged to prevent it from getting > worse. That's a good idea and I'll try to use it in the future. After plugging the drive in and accessing it, I heard those tell-tale signs of hard drive failure: clicks and pops and other unusual noises, so I know that it has some damage. I hate those sounds, having heard them on failing drives too many times before. > >> My question: what kind of checks and/or repair tools should I run on >> the damaged drive after it's mounted? > > As others have mentioned, first make a copy (with the disk unmounted) of the > partitions on that disk with dd, saving them to another drive. That way you > can experiment with the data without further deterioration of the > original. I ran dd and it took over 20 hours to complete. In fact it just finished this evening, after running all day. Lots of FAILURE errors were reported along the way, enough to fill two console screens or more. And of course to complicate things I didn't have a spare drive as an output device that was the *same size*, so I used a smaller drive thinking that it wouldn't matter since the source drive wasn't full anyway. I have no idea if data is scattered around on the FFS filesystem such that cloning a mostly empty, larger drive onto something smaller might lose data... I searched Google and couldn't find the answer, so I proceeded anyway. It doesn't matter now though, as I have a new drive now and another plan. >You can use this disk image e.g. as a vnode-backed memory disk, see > mdconfig(8). If you cannot get a good copy of the disk partitions it might be > a good idea to get a quote from a professional hard drive data recovery > company to do that for you. I've never had occasion to try this (hooray for > backups) but I've heard it can be quite expensive. :-/ I'm going to try dd a second time, but this time I'll use ddrescue as some people suggested and I'll make the target drive an identical-sized 500 Gbyte drive, which I purchased today. I imagine it will take a long time to create this cloned disk... hopefully with fewer errors than dd gave me, though we'll see. > Try using fsck_ffs on (copies of) the disk image to see if that can restore > the damage. If the damage is beyond repair for fsck_ffs, you have a real > problem. Of course is you have a good disk image, your data is still > there, but you might have to use a forensics program like sysutils/sleuthkit > or hexdump to try and piece files together. And even then you cannot be sure > that there is no corrupted data in the files themselves. Good luck with that. :-( Indeed some of the partitions seem to be beyond repair. In particular my /var partition is totally fubar'ed. When using fsck_ffs I got all sorts of errors when trying to repair the partition, things like: BAD SUPER BLOCK: VALUES IN SUPER BLOCK DISAGREE WITH THOSE IN FIRST ALTERNATE So I used the -b option suggested in the man page, "fsck_ffs -y -b 160 /dev/ad0s1d" and it ran and fixed a few things, but then stopped with the following error: fsck_ufs: cannot alloc 4294967292 bytes for inoinfo The worst part of all is that the /var partition would normally be okay to lose if it didn't have my MySQL database on it - the most important data on the server. I just about choked down a golf ball when I discovered my /var partition was in such rough shape and I might be forced to use real recovery tools, or hire a professional for $$$, or be out-of-luck. MySQL databases are normally stored in /var/db/mysql. But then I remembered my MySQL server was actually running in a Jail environment, and therefore it was located at /usr/jails/myjail/var/db/mysql instead of /var/db/mysql, and therefore the jailed MySQL database was on a totally different partition. Lucky! And I was also very lucky that I could mount the large /usr partition in read-only mode and copy off the most critical files I needed, starting with the database. No errors on that part of the disk so far, at least with the few critical files I've copied over. Whew! Until just a few minutes ago I didn't think there'd be a happy ending. But I've got the most critical data copied over now, the rest can wait. I'm going to go run dd a second time (well, ddrescue) now and then start work on the copy once it finishes, in a day or two. One last thing... On Tue, Aug 25, 2009 at 11:45 AM, Polytropon<freebsd@edvax.de> wrote: > > As it has been suggested, there are interesting tools in the > ports collection. I'll post my "famous list" again. Among them, > note ddrescue and dd_rescue. But base system tools such as the > fetch program can help. > > > System: > dd > fsck_ffs > clri > fsdb > fetch -rR <device> > recoverdisk (!) > > Ports: > ddrescue > dd_rescue > ffs2recov > magicrescue > testdisk > The Sleuth Kit: > fls > dls > ils > autopsy > scan_ffs > recoverjpeg > foremost > photorec I just wanted to say: this is a great list. Once the ddrescue copy is complete, I'll start using some of the other tools and see what I can recover. Thanks again to everyone for the help! kelly
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1338880b0908252246s21191e83k7c251366b706532>
