Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 13 Sep 1998 12:37:06 -0700 (PDT)
From:      "Michael V. Harding" <mvh@ix.netcom.com>
To:        dbabler@Rigel.orionsys.com
Cc:        freebsd-questions@FreeBSD.ORG, freebsd-stable@FreeBSD.ORG
Subject:   Re: HELP! IDE Hard Errors?
Message-ID:  <199809131937.MAA16992@netcom1.netcom.com>
In-Reply-To: <Pine.BSF.4.02.9809131053490.785-100000@Rigel.orionsys.com> (message from David Babler on Sun, 13 Sep 1998 11:34:49 -0700 (PDT))
References:   <Pine.BSF.4.02.9809131053490.785-100000@Rigel.orionsys.com>

next in thread | previous in thread | raw e-mail | index | archive | help

I had similar problems with a 1.6 WD IDE drive, including fixing the
drive and having it run fine for a while - the drive did eventually
catch fire and burn to the ground (figuratively).  I don't think there
is anything wrong with the kernel...

Mike Harding

   Date: Sun, 13 Sep 1998 11:34:49 -0700 (PDT)
   From: David Babler <dbabler@Rigel.orionsys.com>
   cc: freebsd-stable@FreeBSD.ORG
   Content-Type: TEXT/PLAIN; charset=US-ASCII
   Sender: owner-freebsd-stable@FreeBSD.ORG
   X-Loop: FreeBSD.ORG


   I've been getting kernel error messages relating to one of my IDE hard
   drives. They started out as missed interrupt warnings and escalated to
   hard and soft write errors. After the hard errors appeared, I ran fsck...
   and a couple of minutes later the system rebooted. No log entry, no dump,
   no nuthin'. 

   The same thing happened several months ago with the drive's predecessor (a
   1.6GB WD IDE drive) - ending up in a panic to replace what seemed an
   unfixable disk. In that case, FBSD reports hard errors and the drive was
   removed. Since the original drive was under warranty, I was all set to
   ship it back to Western Digital. After all, the conventional wisdom is
   that if an IDE drive *reported* a hard error, your drive must be toast
   because all of the drive's hidden reserved sectors had already been mapped
   out. WD wants you to run their diagnostic program before you do RMAs, so I
   popped the drive into a DOS machine and ran wddiag. It said, sure enough,
   that there were bad sectors. It then asked if I WANTED TO FIX THEM! Uh,
   okay, I said yes - what did I have to lose? The "bad" sectors went away
   and the drive passed days of running drive tests. It's been running
   without a hitch ever since in a Novell file server.

   My question is this: is there something REALLY wrong here, or is there
   some facet of FBSD (or the kernel interaction with the drive) that is
   preventing normal operation? If you look at the message log below, the
   SAME section of the disk is reported over and over - isn't there a
   mechanism to map that as bad and never look at it again? Is the kernel
   disabling any IDE "smarts" that is preventing the drive from doing
   automatic repairs? Do I really need to down the system, boot DOS and run
   the Western Digital diagnostics to fix this? Seems like a giant leap
   backward to me.

   I tried to run bad144 on the first disk that showed this problem without
   success (the table was bad, corrupted or didn't exist - I forget the
   details now). Also tried badsect with no good result.

   Heeeeeelllppp! How do I fix this?

   -Dave

   ---- System

   FreeBSD 2.2.6-STABLE #0: Wed Jun  3 11:42:03 PDT 1998
   CPU: i486 DX2 (486-class CPU)
     Origin = "GenuineIntel"  Id = 0x435  Stepping=5
     Features=0x3<FPU,VME>
   real memory  = 20971520 (20480K bytes)
   avail memory = 18386944 (17956K bytes)

   wd1: 4924MB (10085040 sectors), 10672 cyls, 15 heads, 63 S/T, 512 B/S

   > disklabel /wd1
   # /dev/rwd1c:
   type: ESDI
   disk: wd1s1
   label: 
   flags:
   bytes/sector: 512
   sectors/track: 63
   tracks/cylinder: 15
   sectors/cylinder: 945
   cylinders: 10671
   sectors/unit: 10084977
   rpm: 3600
   interleave: 1
   trackskew: 0
   cylinderskew: 0
   headswitch: 0		# milliseconds
   track-to-track seek: 0	# milliseconds
   drivedata: 0 

   8 partitions:
   #        size   offset    fstype   [fsize bsize bps/cpg]
     c: 10084977        0    unused        0     0       	# (Cyl.    0 - 10671*)
     e:  5120000        0    4.2BSD        0     0     0 	# (Cyl.    0 - 5417*)
     f:  4964977  5120000    4.2BSD        0     0     0 	# (Cyl. 5417*- 10671*)

   ---- message log

   Sep 11 11:53:37 Rigel /kernel: wd1: status 50<rdy,seekdone> error 0
   Sep 11 11:56:15 Rigel /kernel: wd1: interrupt timeout:
   Sep 11 11:56:16 Rigel /kernel: wd1: status 58<rdy,seekdone,drq> error 1<no_dam>
   Sep 11 20:51:14 Rigel /kernel: wd1: wdunwedge failed:
   Sep 11 20:51:20 Rigel /kernel: wd1: status 80<busy> error 1<no_dam>
   Sep 12 02:06:57 Rigel /kernel: wd1: wdunwedge failed:
   Sep 12 02:07:02 Rigel /kernel: wd1: status 80<busy> error 1<no_dam>
   Sep 12 18:18:38 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of
    1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
    wd1: status 51<rdy,seekdone,err> error 4<abort>
   Sep 12 18:19:16 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of
    1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
    wd1: status 51<rdy,seekdone,err> error 4<abort>
   Sep 12 18:19:46 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of 
    1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
    wd1: status 51<rdy,seekdone,err> error 4<abort>
   Sep 12 18:20:25 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of
    1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
    wd1: status 51<rdy,seekdone,err> error 4<abort>
   Sep 12 18:20:54 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of 
    1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
    wd1: status 51<rdy,seekdone,err> error 4<abort>
   [system rebooted]
   Sep 13 10:28:46 Rigel /kernel: wd1e: soft error writing fsbn 1864577 of
    1864576-1864591 (wd1 bn 1864577; cn 1973 tn 1 sn 29)
    wd1: status 50<rdy,seekdone> error 4<abort>
   Sep 13 10:29:20 Rigel /kernel: wd1e: soft error writing fsbn 1864577 of
    1864576-1864591 (wd1 bn 1864577; cn 1973 tn 1 sn 29)
    wd1: status 50<rdy,seekdone> error 4<abort>
   Sep 13 10:29:53 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of
    1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
    wd1: status 51<rdy,seekdone,err> error 4<abort>
   Sep 13 10:30:24 Rigel /kernel: wd1e: hard error writing fsbn 1864576 of
    1864576-1864591 (wd1 bn 1864576; cn 1973 tn 1 sn 28)
    wd1: status 51<rdy,seekdone,err> error 4<abort>


   To Unsubscribe: send mail to majordomo@FreeBSD.org
   with "unsubscribe freebsd-stable" in the body of the message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199809131937.MAA16992>