From owner-freebsd-hackers@FreeBSD.ORG Fri Apr 15 15:18:20 2005 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from green.homeunix.org (freefall.freebsd.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 77CFD16A4CE; Fri, 15 Apr 2005 15:18:20 +0000 (GMT) Received: from green.homeunix.org (green@localhost [127.0.0.1]) by green.homeunix.org (8.13.3/8.13.1) with ESMTP id j3FFJ4l3003148; Fri, 15 Apr 2005 11:19:04 -0400 (EDT) (envelope-from green@green.homeunix.org) Received: (from green@localhost) by green.homeunix.org (8.13.3/8.13.1/Submit) id j3FFJ4x9003147; Fri, 15 Apr 2005 11:19:04 -0400 (EDT) (envelope-from green) Date: Fri, 15 Apr 2005 11:19:04 -0400 From: Brian Fundakowski Feldman To: Bill Vermillion Message-ID: <20050415151904.GR981@green.homeunix.org> References: <20050415120104.AD04C16A4CF@hub.freebsd.org> <20050415141052.GB96815@wjv.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050415141052.GB96815@wjv.com> User-Agent: Mutt/1.5.6i cc: freebsd-hackers@freebsd.org Subject: Re: immenent disk failure ? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Apr 2005 15:18:20 -0000 On Fri, Apr 15, 2005 at 10:10:52AM -0400, Bill Vermillion wrote: > On or about Fri, Apr 15, 2005 at 12:01 , while attempting a > Zarathustra emulation freebsd-hackers-request@freebsd.org thus spake: > > > > > Message: 4 > > Date: Thu, 14 Apr 2005 10:58:02 -0500 (CDT) > > From: "H. S." > > Subject: imminent disk failure ? > > ... > > > I have a server running 4.X for almost two years now, without > > problems - rock solid as it should be - yesterday the server > > became unresponsive, now that I have access again, and while > > checking the logs, I found this as the last message before the > > unresponsiveness: > > > /kernel: ad0: READ command timeout tag=0 serv=0 - resetting > > > The next message is the system getting back on, 1hour later. > > > I have not changed anything kernel-related on this system for > > a long time (jul 2004), just apply the occasional kernel patch > > and rebuild/reboot the system. I never encountered this problem > > before. Could this message mean this disk is giving its last > > breaths ? > > It might help if we knew a bit more about the system such > a drive make and model - you can see that in dmesg. That may > point out some device that is known to be problematic. > > The last time I got timeout errors like that was in the 3.x era > with a SCSI controller. Last IDE problem I had was a bad read > that force the system into PIO mode with over 75% performance > decrease. The only way around that one that I was aware of was a > reboot. For any disk within perhaps the last five years you should be able to just use SMART to perform a thorough health test on your hard drives and view their statistics and error logs. I don't know why it doesn't currently do much on SCSI, but ports/sysutils/smartmontools works great for ATA. -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\