Date: Sun, 13 Sep 1998 11:34:49 -0700 (PDT) From: David Babler <dbabler@Rigel.orionsys.com> To: freebsd-questions@FreeBSD.ORG Cc: freebsd-stable@FreeBSD.ORG Subject: HELP! IDE Hard Errors? Message-ID: <Pine.BSF.4.02.9809131053490.785-100000@Rigel.orionsys.com>
next in thread | raw e-mail | index | archive | help
I've been getting kernel error messages relating to one of my IDE hard drives. They started out as missed interrupt warnings and escalated to hard and soft write errors. After the hard errors appeared, I ran fsck... and a couple of minutes later the system rebooted. No log entry, no dump, no nuthin'. The same thing happened several months ago with the drive's predecessor (a 1.6GB WD IDE drive) - ending up in a panic to replace what seemed an unfixable disk. In that case, FBSD reports hard errors and the drive was removed. Since the original drive was under warranty, I was all set to ship it back to Western Digital. After all, the conventional wisdom is that if an IDE drive *reported* a hard error, your drive must be toast because all of the drive's hidden reserved sectors had already been mapped out. WD wants you to run their diagnostic program before you do RMAs, so I popped the drive into a DOS machine and ran wddiag. It said, sure enough, that there were bad sectors. It then asked if I WANTED TO FIX THEM! Uh, okay, I said yes - what did I have to lose? The "bad" sectors went away and the drive passed days of running drive tests. It's been running without a hitch ever since in a Novell file server. My question is this: is there something REALLY wrong here, or is there some facet of FBSD (or the kernel interaction with the drive) that is preventing normal operation? If you look at the message log below, the SAME section of the disk is reported over and over - isn't there a mechanism to map that as bad and never look at it again? Is the kernel disabling any IDE "smarts" that is preventing the drive from doing automatic repairs? Do I really need to down the system, boot DOS and run the Western Digital diagnostics to fix this? Seems like a giant leap backward to me. I tried to run bad144 on the first disk that showed this problem without success (the table was bad, corrupted or didn't exist - I forget the details now). Also tried badsect with no good result. Heeeeeelllppp! How do I fix this? -Dave ---- System FreeBSD 2.2.6-STABLE #0: Wed Jun 3 11:42:03 PDT 1998 CPU: i486 DX2 (486-class CPU) Origin = "GenuineIntel" Id = 0x435 Stepping=5 Features=0x3<FPU,VME> real memory = 20971520 (20480K bytes) avail memory = 18386944 (17956K bytes) wd1: 4924MB (10085040 sectors), 10672 cyls, 15 heads, 63 S/T, 512 B/S > disklabel /wd1 # /dev/rwd1c: type: ESDI disk: wd1s1 label: flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 15 sectors/cylinder: 945 cylinders: 10671 sectors/unit: 10084977 rpm: 3600 interleave: 1 trackskew: 0 cylinderskew: 0 headswitch: 0 # milliseconds track-to-track seek: 0 # milliseconds drivedata: 0 8 partitions: # size offset fstype [fsize bsize bps/cpg] c: 10084977 0 unused 0 0 # (Cyl. 0 - 10671*) e: 5120000 0 4.2BSD 0 0 0 # (Cyl. 0 - 5417*) f: 4964977 5120000 4.2BSD 0 0 0 # (Cyl. 5417*- 10671*) ---- message log Sep 11 11:53:37 Rigel /kernel: wd1: status 50<rdy,seekdone> error 0 Sep 11 11:56:15 Rigel /kernel: wd1: interrupt timeout: Sep 11 11:56:16 Rigel /kernel: wd1: status 58<rdy,seekdone,drq> error 1<no_dam> Sep 11 20:51:14 Rigel /kernel: wd1: wdunwedge failed: Sep 11 20:51:20 Rigel /kernel: wd1: status 80<busy> error 1<no_dam> Sep 12 02:06:57 Rigel /kernel: wd1: wdunwedge failed: Sep 12 02:07:02 Rigel /kernel: wd1: status 80<busy> error 1<no_dam> Sep 12 18:18:38 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of 1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28) wd1: status 51<rdy,seekdone,err> error 4<abort> Sep 12 18:19:16 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of 1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28) wd1: status 51<rdy,seekdone,err> error 4<abort> Sep 12 18:19:46 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of 1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28) wd1: status 51<rdy,seekdone,err> error 4<abort> Sep 12 18:20:25 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of 1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28) wd1: status 51<rdy,seekdone,err> error 4<abort> Sep 12 18:20:54 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of 1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28) wd1: status 51<rdy,seekdone,err> error 4<abort> [system rebooted] Sep 13 10:28:46 Rigel /kernel: wd1e: soft error writing fsbn 1864577 of 1864576-1864591 (wd1 bn 1864577; cn 1973 tn 1 sn 29) wd1: status 50<rdy,seekdone> error 4<abort> Sep 13 10:29:20 Rigel /kernel: wd1e: soft error writing fsbn 1864577 of 1864576-1864591 (wd1 bn 1864577; cn 1973 tn 1 sn 29) wd1: status 50<rdy,seekdone> error 4<abort> Sep 13 10:29:53 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of 1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28) wd1: status 51<rdy,seekdone,err> error 4<abort> Sep 13 10:30:24 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of 1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28) wd1: status 51<rdy,seekdone,err> error 4<abort> To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.02.9809131053490.785-100000>