From owner-freebsd-questions@FreeBSD.ORG Sun Apr 2 17:07:43 2006 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3869616A41F for ; Sun, 2 Apr 2006 17:07:43 +0000 (UTC) (envelope-from Shane@007Marketing.com) Received: from ash25e.internode.on.net (ash25e.internode.on.net [203.16.214.182]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5CFA243D73 for ; Sun, 2 Apr 2006 17:07:24 +0000 (GMT) (envelope-from Shane@007Marketing.com) Received: from [192.168.8.50] (ppp247-71.static.internode.on.net [203.122.247.71]) by ash25e.internode.on.net (8.13.6/8.13.5) with ESMTP id k32H7JJW012041; Mon, 3 Apr 2006 02:37:20 +0930 (CST) (envelope-from Shane@007Marketing.com) User-Agent: Microsoft-Entourage/10.1.4.030702.0 Date: Mon, 03 Apr 2006 02:37:17 +1030 From: Shane Ambler To: , FreeBSD Mailing Lists Message-ID: In-Reply-To: <03aa01c65671$2ec95f00$6501a8c0@workdog> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit Cc: Subject: Re: Hard Disk problems X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Apr 2006 17:07:43 -0000 On 3/4/06 2:49 AM, "Gayn Winters" wrote: >> [mailto:owner-freebsd-questions@freebsd.org] On Behalf Of Shane Ambler >> Sent: Saturday, April 01, 2006 3:10 AM >> To: FreeBSD Mailing Lists >> Subject: Hard Disk problems >> >> >> A few days ago I started getting some disk errors and can't >> seem to find a >> reference to find a way to fix them (other than the obvious re-format) >> >> >> The daily security run output contains the following (abbreviated) >> >> Checking setuid files and devices: >> find: /usr/ports/databases/db43/work/db-4.3.28/db: Input/output error >> find: /usr/ports/devel/git/Makefile: Input/output error >> >> ~ repeated 32 times for different files (thankfully all in >> the ports tree) >> >> tower.home.com kernel log messages: >>> ad0: FAILURE - READ_DMA status=51 >> error=40 LBA=139102367 >>> ad0: FAILURE - READ_DMA status=51 >> error=1 LBA=139102367 >> >> These 2 error codes are repeated a total of 38 times all with >> the same LBA >> >> If I start in single user mode and do fsck it takes about >> half an hour to >> get through and repeats similar errors many times for just >> about every check >> it does. >> >> Running #fsck -y >> fsckout (while in multiuser mode) is as follows - >> followed by dmesg output since boot >> >>> cat fsckout >> ** /dev/ad0s1a (NO WRITE) >> ** Last Mounted on / Snip >> ad0: FAILURE - READ_DMA status=51 >> error=1 >> LBA=139102393 >> >> >> >> >> -- >> >> Shane Ambler > > Looks to me like your disk subsystem is dying. Most likely it is just > the disk ad0. If you don't have a good backup, do that immediately. > Get a new disk in there and test it thoroughly (with the manufacturer's > diagnostics.) If all is well, restore to it. You'll probably want to > reread the section in the Handbook on Moving to a Larger Disk, since > this is a good time to rethink the sizes of your partitions. > > Incidentally, you can just install the new disk (as ad1), install FBSD > on it, and dump|restore from ad0 to ad1. > > Once restored, you'll still have to clean up the damage. This is easier > if your new new disk has a separate partition for user data, since you > can use a fresh install of the OS, the ports, etc. and worry about > repairing the user data as best you can. > > Good luck! > > -gayn > > Bristol Systems Inc. > 714/532-6776 > www.bristolsystems.com > > > Thanks. I was kinda thinking that might be the case. Space isn't an issue (it's a 120GB drive) this is mostly a testing/learning server at home - runs squid and dns cache for home use (my other half does a lot of auto-surfing to try and make a few bucks) and apache/mysql for testing web devel. The files that showed up as i/o errors are all in /usr/ports so no probs there, I should be able to copy across what is readable to another drive without any problems or real loss and worthwhile data there is easy to replace. I am fairly new to *nix and was looking to see if I could learn more disaster recovery - thought there might be a chance that it was just bad sectors that weren't getting mapped out automagicaly and I could learn to fix it manually without reformatting. Now I know that if I see it happen again I should just replace the disk as soon as I can. -- Shane Ambler Sales Department 007Marketing.com Shane@007Marketing.com