Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 13 Sep 1998 11:34:49 -0700 (PDT)
From:      David Babler <dbabler@Rigel.orionsys.com>
To:        freebsd-questions@FreeBSD.ORG
Cc:        freebsd-stable@FreeBSD.ORG
Subject:   HELP! IDE Hard Errors?
Message-ID:  <Pine.BSF.4.02.9809131053490.785-100000@Rigel.orionsys.com>

next in thread | raw e-mail | index | archive | help

I've been getting kernel error messages relating to one of my IDE hard
drives. They started out as missed interrupt warnings and escalated to
hard and soft write errors. After the hard errors appeared, I ran fsck...
and a couple of minutes later the system rebooted. No log entry, no dump,
no nuthin'. 

The same thing happened several months ago with the drive's predecessor (a
1.6GB WD IDE drive) - ending up in a panic to replace what seemed an
unfixable disk. In that case, FBSD reports hard errors and the drive was
removed. Since the original drive was under warranty, I was all set to
ship it back to Western Digital. After all, the conventional wisdom is
that if an IDE drive *reported* a hard error, your drive must be toast
because all of the drive's hidden reserved sectors had already been mapped
out. WD wants you to run their diagnostic program before you do RMAs, so I
popped the drive into a DOS machine and ran wddiag. It said, sure enough,
that there were bad sectors. It then asked if I WANTED TO FIX THEM! Uh,
okay, I said yes - what did I have to lose? The "bad" sectors went away
and the drive passed days of running drive tests. It's been running
without a hitch ever since in a Novell file server.

My question is this: is there something REALLY wrong here, or is there
some facet of FBSD (or the kernel interaction with the drive) that is
preventing normal operation? If you look at the message log below, the
SAME section of the disk is reported over and over - isn't there a
mechanism to map that as bad and never look at it again? Is the kernel
disabling any IDE "smarts" that is preventing the drive from doing
automatic repairs? Do I really need to down the system, boot DOS and run
the Western Digital diagnostics to fix this? Seems like a giant leap
backward to me.

I tried to run bad144 on the first disk that showed this problem without
success (the table was bad, corrupted or didn't exist - I forget the
details now). Also tried badsect with no good result.

Heeeeeelllppp! How do I fix this?

-Dave

---- System

FreeBSD 2.2.6-STABLE #0: Wed Jun  3 11:42:03 PDT 1998
CPU: i486 DX2 (486-class CPU)
  Origin = "GenuineIntel"  Id = 0x435  Stepping=5
  Features=0x3<FPU,VME>
real memory  = 20971520 (20480K bytes)
avail memory = 18386944 (17956K bytes)

wd1: 4924MB (10085040 sectors), 10672 cyls, 15 heads, 63 S/T, 512 B/S

> disklabel /wd1
# /dev/rwd1c:
type: ESDI
disk: wd1s1
label: 
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 15
sectors/cylinder: 945
cylinders: 10671
sectors/unit: 10084977
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0		# milliseconds
track-to-track seek: 0	# milliseconds
drivedata: 0 

8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
  c: 10084977        0    unused        0     0       	# (Cyl.    0 - 10671*)
  e:  5120000        0    4.2BSD        0     0     0 	# (Cyl.    0 - 5417*)
  f:  4964977  5120000    4.2BSD        0     0     0 	# (Cyl. 5417*- 10671*)

---- message log

Sep 11 11:53:37 Rigel /kernel: wd1: status 50<rdy,seekdone> error 0
Sep 11 11:56:15 Rigel /kernel: wd1: interrupt timeout:
Sep 11 11:56:16 Rigel /kernel: wd1: status 58<rdy,seekdone,drq> error 1<no_dam>
Sep 11 20:51:14 Rigel /kernel: wd1: wdunwedge failed:
Sep 11 20:51:20 Rigel /kernel: wd1: status 80<busy> error 1<no_dam>
Sep 12 02:06:57 Rigel /kernel: wd1: wdunwedge failed:
Sep 12 02:07:02 Rigel /kernel: wd1: status 80<busy> error 1<no_dam>
Sep 12 18:18:38 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of
 1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
 wd1: status 51<rdy,seekdone,err> error 4<abort>
Sep 12 18:19:16 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of
 1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
 wd1: status 51<rdy,seekdone,err> error 4<abort>
Sep 12 18:19:46 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of 
 1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
 wd1: status 51<rdy,seekdone,err> error 4<abort>
Sep 12 18:20:25 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of
 1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
 wd1: status 51<rdy,seekdone,err> error 4<abort>
Sep 12 18:20:54 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of 
 1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
 wd1: status 51<rdy,seekdone,err> error 4<abort>
[system rebooted]
Sep 13 10:28:46 Rigel /kernel: wd1e: soft error writing fsbn 1864577 of
 1864576-1864591 (wd1 bn 1864577; cn 1973 tn 1 sn 29)
 wd1: status 50<rdy,seekdone> error 4<abort>
Sep 13 10:29:20 Rigel /kernel: wd1e: soft error writing fsbn 1864577 of
 1864576-1864591 (wd1 bn 1864577; cn 1973 tn 1 sn 29)
 wd1: status 50<rdy,seekdone> error 4<abort>
Sep 13 10:29:53 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of
 1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
 wd1: status 51<rdy,seekdone,err> error 4<abort>
Sep 13 10:30:24 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of
 1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
 wd1: status 51<rdy,seekdone,err> error 4<abort>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.02.9809131053490.785-100000>