Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 03 Apr 2006 02:37:17 +1030
From:      Shane Ambler <Shane@007Marketing.com>
To:        <gayn.winters@bristolsystems.com>, FreeBSD Mailing Lists <freebsd-questions@freebsd.org>
Subject:   Re: Hard Disk problems
Message-ID:  <C0563ADD.3ED20%Shane@007Marketing.com>
In-Reply-To: <03aa01c65671$2ec95f00$6501a8c0@workdog>

next in thread | previous in thread | raw e-mail | index | archive | help
On 3/4/06 2:49 AM, "Gayn Winters" <gayn.winters@bristolsystems.com> wrote:

>> [mailto:owner-freebsd-questions@freebsd.org] On Behalf Of Shane Ambler
>> Sent: Saturday, April 01, 2006 3:10 AM
>> To: FreeBSD Mailing Lists
>> Subject: Hard Disk problems
>> 
>> 
>> A few days ago I started getting some disk errors and can't
>> seem to find a
>> reference to find a way to fix them (other than the obvious re-format)
>> 
>> 
>> The daily security run output contains the following (abbreviated)
>> 
>> Checking setuid files and devices:
>> find: /usr/ports/databases/db43/work/db-4.3.28/db: Input/output error
>> find: /usr/ports/devel/git/Makefile: Input/output error
>> 
>> ~ repeated 32 times for different files (thankfully all in
>> the ports tree)
>> 
>> tower.home.com kernel log messages:
>>> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR>
>> error=40<UNCORRECTABLE> LBA=139102367
>>> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR>
>> error=1<ILLEGAL_LENGTH> LBA=139102367
>> 
>> These 2 error codes are repeated a total of 38 times all with
>> the same LBA
>> 
>> If I start in single user mode and do fsck it takes about
>> half an hour to
>> get through and repeats similar errors many times for just
>> about every check
>> it does.
>> 
>> Running #fsck -y >> fsckout (while in multiuser mode) is as follows -
>> followed by dmesg output since boot
>> 
>>> cat fsckout 
>> ** /dev/ad0s1a (NO WRITE)
>> ** Last Mounted on /

Snip

>> ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR>
>> error=1<ILLEGAL_LENGTH>
>> LBA=139102393
>> 
>> 
>> 
>> 
>> -- 
>> 
>> Shane Ambler
> 
> Looks to me like your disk subsystem is dying.  Most likely it is just
> the disk ad0.  If you don't have a good backup, do that immediately.
> Get a new disk in there and test it thoroughly (with the manufacturer's
> diagnostics.)  If all is well, restore to it.  You'll probably want to
> reread the section in the Handbook on Moving to a Larger Disk, since
> this is a good time to rethink the sizes of your partitions.
> 
> Incidentally, you can just install the new disk (as ad1), install FBSD
> on it, and dump|restore from ad0 to ad1.
> 
> Once restored, you'll still have to clean up the damage.  This is easier
> if your new new disk has a separate partition for user data, since you
> can use a fresh install of the OS, the ports, etc. and worry about
> repairing the user data as best you can.
> 
> Good luck!
> 
> -gayn
> 
> Bristol Systems Inc.
> 714/532-6776
> www.bristolsystems.com
> 
> 
> 

Thanks.

I was kinda thinking that might be the case. Space isn't an issue (it's a
120GB drive) this is mostly a testing/learning server at home - runs squid
and dns cache for home use (my other half does a lot of auto-surfing to try
and make a few bucks) and apache/mysql for testing web devel.

The files that showed up as i/o errors are all in /usr/ports so no probs
there, I should be able to copy across what is readable to another drive
without any problems or real loss and worthwhile data there is easy to
replace. 

I am fairly new to *nix and was looking to see if I could learn more
disaster recovery - thought there might be a chance that it was just bad
sectors that weren't getting mapped out automagicaly and I could learn to
fix it manually without reformatting. Now I know that if I see it happen
again I should just replace the disk as soon as I can.


-- 

Shane Ambler
Sales Department
007Marketing.com
Shane@007Marketing.com





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C0563ADD.3ED20%Shane>