Date: Tue, 14 Oct 2003 18:00:25 -0400 From: Eduard Martinescu <martines@rochester.rr.com> To: DavidB <david@whatistruth.net> Cc: freebsd-stable@freebsd.org Subject: Re: ATA failure with 4.6.2 & 250GB drive? Message-ID: <1066168825.13151.23.camel@firestorm.crafts4life.com> In-Reply-To: <3F8C3C97.3050405@whatistruth.net> References: <3F8C3C97.3050405@whatistruth.net>
next in thread | previous in thread | raw e-mail | index | archive | help
<shameless plug> If you want to see the SMART information from the hard drives, and you are rrunning a recent 5-CURRENT (that includes that ATAng), you can also test out the smartmontools package from http://smartmontools.sourceforge.net I just finished up some work on porting the code to FreeBSD, and if you check out the latest CVS version (or soon to be release 5.21 release), you can help me test it out and see what the drive itself is reporting..... </shameless plug> The smartmontools package should also work with SCSI drives (utilizing CAM), and that portion should work under a 4-STABLE release, although I have to admit I haven't tested it. I also plan on submitting a PORT for it in the very near future (just waiting for the .5.21 release) Ed On Tue, 2003-10-14 at 14:12, DavidB wrote: > Kevin Oberman wrote: > > >> Date: Tue, 14 Oct 2003 09:55:54 +0100 > >> From: Scott Mitchell <scott+freebsd@fishballoon.org> > >> Sender: owner-freebsd-stable@freebsd.org > >> > >> On Mon, Oct 13, 2003 at 10:09:10AM +0100, Scott Mitchell wrote: > >> > >>> Hi all, > >>> > >>> Just installed a Maxtor 250GB PATA drive in one of our servers, to > >>> be used > >>> as a backup staging area. This was actually a replacement for an > >>> identical > >>> drive that appeared to have died after a month of service. > >>> > >>> Anyway, 2 days after this drive was installed I start seeing this in > >>> the > >>> daily logs: > >>> > >>> > >>>> ad1s1e: hard error reading fsbn 850845887 of 425422912-425422943 > >>>> (ad1s1 bn 850845887; cn 52962 tn 180 sn 17) trying PIO mode > >>>> ad1s1e: hard error reading fsbn 850845887 of 425422912-425422943 > >>>> (ad1s1 bn 850845887; cn 52962 tn 180 sn 17) status=59 error=40 > >>>> ad1s1e: hard error reading fsbn 850845887 of 425422912-425422943 > >>>> (ad1s1 bn 850845887; cn 52962 tn 180 sn 17) status=59 error=40 > >>>> ad1s1e: hard error reading fsbn 850845887 of 425422912-425422943 > >>>> (ad1s1 bn 850845887; cn 52962 tn 180 sn 17) status=59 error=40 > >>> > >>> > >>> ... > >> > >> > >> OK, swapped out the cable (from an 80- to 40-wire one, as it happened, > >> although that should make no difference on a UDMA33 controller). Same > >> errors appeared again while the backups were running. > >> > >> Some more information on how this drive is being used - we're dumping > >> two > >> vinum RAID5 volumes onto it, one local and one remote, writing to the > >> backup disk over NFS. Both dumps kick off at 0300, with the remote one > >> finishing at 0305 last night. The first ATA error appeared in the > >> logs at > >> 0325, while the local backup was still running. The last error was > >> logged > >> at 0355, but the backup itself didn't finish until nearly 0500. > >> > >> Anyone have any more ideas on how to diagnose this? It does occur to me > >> that the daily periodic run also kicks off at 0301 but that is > >> usually all > >> done before 0330. > > > > > > > > It's a real drive problem, but possibly not a terminal one. (I had the > > same issue on one of my drives a few months ago and it's fine now.) > > > > The problem is that the system is getting an error trying to read this > > area of the disk. It's an unmapped bunch of bad blocks. The system > > gets an unrecoverable error trying to read these blocks and that is > > what you see reported. Since it can't read "good" data, it does not > > relocate the bad data, but just leaves it there and reports errors > > every time it tries to read the data. > > > > First, any files containing data stored in these blocks are probably > > toast. Or, at least garbled. Sorry. > > > > The fix/workaround is to move the file(s) involved so that the damaged > > blocks are marked free and relocated to spar space on the drive. You > > can try to figure out just which file(s) use those blocks. There > > might even be a reasonable way to do this...I just don't know what it > > is. > > > > Another "fix"is to simply copy the drive onto another and then copy it > > back. dd(1) will do the trick as will dump/restore. (I'd suggest the > > dump/restore to copy the data out and dd to copy it back if the disks > > have identical geometries.) Once the data is restored to the original > > disk, the bad blocks will have been re-directed by the drive and will > > no longer trouble you. > > > > Modern disks are pretty smart at error recovery, but some failures are > > too sudden for the drive to be able to deal with them without losing > > data. > > > Regarding a fix: > > I had similar read error message not long ago when dumping to tape, > wondered what they could mean. So I went to the hard drive > manufacturer's website and download a DOS tool to scan/repair the > harddrive. > > Just to note an issue: I had one bootdisk for to check my harddrive > which was an Hitachi (HGST) drive in my laptop and one for the Western > Digital which was the drive of concern. For some reason I used the > software utility from Hitachi on the WD, which was a good thing, because > it reported bad blocks and wouldn't fix them because it recognized that > it wasn't their drive. Then I used the bootdisk I had created for the WD > utility and ran it, (this is why it was a good thing) it did the scan > reported NO issues, checked its logs to see if it had logged fixing any > problems. The utilities logs said the drive had no issues. > Just to double check I re-ran the HGST tool and it didn't find any bad > blocks. Hmm. Those knuckle-heads at Western Digital made the utility to > fix the bad blocks silently. I find this under-handed because you might > not find a disk going bad until the disk is totally failing. Hmm. wonder > if this helps get it past the warranty before the drive completely fails. > [ya know when 1:bad blocks show up, 2:you clean 'em up 3:return to step1 > that the drive will die the death in the near future] > > So you can get utilities from the manufacturer usually {atleast WD, > HGST, and Seagate} to do some subset of turn on and off S.M.A.R.T., > exercise the harddrive, scan for errors, repair errors, low-level > format, .... > > If you don't mind booting from a DOS bootdisk to run the tool. WD is a > little confusing which to grab. But note as I found out some > manufacturer's might silently repair certain issues. > > Hope this help, > David > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" -- Eduard Martinescu <martines@rochester.rr.com>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1066168825.13151.23.camel>