Date: Mon, 03 Apr 2006 02:37:17 +1030 From: Shane Ambler <Shane@007Marketing.com> To: <gayn.winters@bristolsystems.com>, FreeBSD Mailing Lists <freebsd-questions@freebsd.org> Subject: Re: Hard Disk problems Message-ID: <C0563ADD.3ED20%Shane@007Marketing.com> In-Reply-To: <03aa01c65671$2ec95f00$6501a8c0@workdog>
next in thread | previous in thread | raw e-mail | index | archive | help
On 3/4/06 2:49 AM, "Gayn Winters" <gayn.winters@bristolsystems.com> wrote: >> [mailto:owner-freebsd-questions@freebsd.org] On Behalf Of Shane Ambler >> Sent: Saturday, April 01, 2006 3:10 AM >> To: FreeBSD Mailing Lists >> Subject: Hard Disk problems >> >> >> A few days ago I started getting some disk errors and can't >> seem to find a >> reference to find a way to fix them (other than the obvious re-format) >> >> >> The daily security run output contains the following (abbreviated) >> >> Checking setuid files and devices: >> find: /usr/ports/databases/db43/work/db-4.3.28/db: Input/output error >> find: /usr/ports/devel/git/Makefile: Input/output error >> >> ~ repeated 32 times for different files (thankfully all in >> the ports tree) >> >> tower.home.com kernel log messages: >>> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> >> error=40<UNCORRECTABLE> LBA=139102367 >>> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> >> error=1<ILLEGAL_LENGTH> LBA=139102367 >> >> These 2 error codes are repeated a total of 38 times all with >> the same LBA >> >> If I start in single user mode and do fsck it takes about >> half an hour to >> get through and repeats similar errors many times for just >> about every check >> it does. >> >> Running #fsck -y >> fsckout (while in multiuser mode) is as follows - >> followed by dmesg output since boot >> >>> cat fsckout >> ** /dev/ad0s1a (NO WRITE) >> ** Last Mounted on / Snip >> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> >> error=1<ILLEGAL_LENGTH> >> LBA=139102393 >> >> >> >> >> -- >> >> Shane Ambler > > Looks to me like your disk subsystem is dying. Most likely it is just > the disk ad0. If you don't have a good backup, do that immediately. > Get a new disk in there and test it thoroughly (with the manufacturer's > diagnostics.) If all is well, restore to it. You'll probably want to > reread the section in the Handbook on Moving to a Larger Disk, since > this is a good time to rethink the sizes of your partitions. > > Incidentally, you can just install the new disk (as ad1), install FBSD > on it, and dump|restore from ad0 to ad1. > > Once restored, you'll still have to clean up the damage. This is easier > if your new new disk has a separate partition for user data, since you > can use a fresh install of the OS, the ports, etc. and worry about > repairing the user data as best you can. > > Good luck! > > -gayn > > Bristol Systems Inc. > 714/532-6776 > www.bristolsystems.com > > > Thanks. I was kinda thinking that might be the case. Space isn't an issue (it's a 120GB drive) this is mostly a testing/learning server at home - runs squid and dns cache for home use (my other half does a lot of auto-surfing to try and make a few bucks) and apache/mysql for testing web devel. The files that showed up as i/o errors are all in /usr/ports so no probs there, I should be able to copy across what is readable to another drive without any problems or real loss and worthwhile data there is easy to replace. I am fairly new to *nix and was looking to see if I could learn more disaster recovery - thought there might be a chance that it was just bad sectors that weren't getting mapped out automagicaly and I could learn to fix it manually without reformatting. Now I know that if I see it happen again I should just replace the disk as soon as I can. -- Shane Ambler Sales Department 007Marketing.com Shane@007Marketing.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C0563ADD.3ED20%Shane>