Date: Sun, 13 Sep 1998 12:37:06 -0700 (PDT) From: "Michael V. Harding" <mvh@ix.netcom.com> To: dbabler@Rigel.orionsys.com Cc: freebsd-questions@FreeBSD.ORG, freebsd-stable@FreeBSD.ORG Subject: Re: HELP! IDE Hard Errors? Message-ID: <199809131937.MAA16992@netcom1.netcom.com> In-Reply-To: <Pine.BSF.4.02.9809131053490.785-100000@Rigel.orionsys.com> (message from David Babler on Sun, 13 Sep 1998 11:34:49 -0700 (PDT)) References: <Pine.BSF.4.02.9809131053490.785-100000@Rigel.orionsys.com>
next in thread | previous in thread | raw e-mail | index | archive | help
I had similar problems with a 1.6 WD IDE drive, including fixing the
drive and having it run fine for a while - the drive did eventually
catch fire and burn to the ground (figuratively). I don't think there
is anything wrong with the kernel...
Mike Harding
Date: Sun, 13 Sep 1998 11:34:49 -0700 (PDT)
From: David Babler <dbabler@Rigel.orionsys.com>
cc: freebsd-stable@FreeBSD.ORG
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-stable@FreeBSD.ORG
X-Loop: FreeBSD.ORG
I've been getting kernel error messages relating to one of my IDE hard
drives. They started out as missed interrupt warnings and escalated to
hard and soft write errors. After the hard errors appeared, I ran fsck...
and a couple of minutes later the system rebooted. No log entry, no dump,
no nuthin'.
The same thing happened several months ago with the drive's predecessor (a
1.6GB WD IDE drive) - ending up in a panic to replace what seemed an
unfixable disk. In that case, FBSD reports hard errors and the drive was
removed. Since the original drive was under warranty, I was all set to
ship it back to Western Digital. After all, the conventional wisdom is
that if an IDE drive *reported* a hard error, your drive must be toast
because all of the drive's hidden reserved sectors had already been mapped
out. WD wants you to run their diagnostic program before you do RMAs, so I
popped the drive into a DOS machine and ran wddiag. It said, sure enough,
that there were bad sectors. It then asked if I WANTED TO FIX THEM! Uh,
okay, I said yes - what did I have to lose? The "bad" sectors went away
and the drive passed days of running drive tests. It's been running
without a hitch ever since in a Novell file server.
My question is this: is there something REALLY wrong here, or is there
some facet of FBSD (or the kernel interaction with the drive) that is
preventing normal operation? If you look at the message log below, the
SAME section of the disk is reported over and over - isn't there a
mechanism to map that as bad and never look at it again? Is the kernel
disabling any IDE "smarts" that is preventing the drive from doing
automatic repairs? Do I really need to down the system, boot DOS and run
the Western Digital diagnostics to fix this? Seems like a giant leap
backward to me.
I tried to run bad144 on the first disk that showed this problem without
success (the table was bad, corrupted or didn't exist - I forget the
details now). Also tried badsect with no good result.
Heeeeeelllppp! How do I fix this?
-Dave
---- System
FreeBSD 2.2.6-STABLE #0: Wed Jun 3 11:42:03 PDT 1998
CPU: i486 DX2 (486-class CPU)
Origin = "GenuineIntel" Id = 0x435 Stepping=5
Features=0x3<FPU,VME>
real memory = 20971520 (20480K bytes)
avail memory = 18386944 (17956K bytes)
wd1: 4924MB (10085040 sectors), 10672 cyls, 15 heads, 63 S/T, 512 B/S
> disklabel /wd1
# /dev/rwd1c:
type: ESDI
disk: wd1s1
label:
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 15
sectors/cylinder: 945
cylinders: 10671
sectors/unit: 10084977
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0 # milliseconds
track-to-track seek: 0 # milliseconds
drivedata: 0
8 partitions:
# size offset fstype [fsize bsize bps/cpg]
c: 10084977 0 unused 0 0 # (Cyl. 0 - 10671*)
e: 5120000 0 4.2BSD 0 0 0 # (Cyl. 0 - 5417*)
f: 4964977 5120000 4.2BSD 0 0 0 # (Cyl. 5417*- 10671*)
---- message log
Sep 11 11:53:37 Rigel /kernel: wd1: status 50<rdy,seekdone> error 0
Sep 11 11:56:15 Rigel /kernel: wd1: interrupt timeout:
Sep 11 11:56:16 Rigel /kernel: wd1: status 58<rdy,seekdone,drq> error 1<no_dam>
Sep 11 20:51:14 Rigel /kernel: wd1: wdunwedge failed:
Sep 11 20:51:20 Rigel /kernel: wd1: status 80<busy> error 1<no_dam>
Sep 12 02:06:57 Rigel /kernel: wd1: wdunwedge failed:
Sep 12 02:07:02 Rigel /kernel: wd1: status 80<busy> error 1<no_dam>
Sep 12 18:18:38 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of
1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
wd1: status 51<rdy,seekdone,err> error 4<abort>
Sep 12 18:19:16 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of
1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
wd1: status 51<rdy,seekdone,err> error 4<abort>
Sep 12 18:19:46 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of
1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
wd1: status 51<rdy,seekdone,err> error 4<abort>
Sep 12 18:20:25 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of
1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
wd1: status 51<rdy,seekdone,err> error 4<abort>
Sep 12 18:20:54 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of
1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
wd1: status 51<rdy,seekdone,err> error 4<abort>
[system rebooted]
Sep 13 10:28:46 Rigel /kernel: wd1e: soft error writing fsbn 1864577 of
1864576-1864591 (wd1 bn 1864577; cn 1973 tn 1 sn 29)
wd1: status 50<rdy,seekdone> error 4<abort>
Sep 13 10:29:20 Rigel /kernel: wd1e: soft error writing fsbn 1864577 of
1864576-1864591 (wd1 bn 1864577; cn 1973 tn 1 sn 29)
wd1: status 50<rdy,seekdone> error 4<abort>
Sep 13 10:29:53 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of
1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
wd1: status 51<rdy,seekdone,err> error 4<abort>
Sep 13 10:30:24 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of
1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
wd1: status 51<rdy,seekdone,err> error 4<abort>
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199809131937.MAA16992>
