Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 1 Feb 1997 16:03:46 +0100
From:      j@uriah.heep.sax.de (J Wunsch)
To:        Don.Lewis@tsc.tdk.com (Don Lewis)
Cc:        freebsd-scsi@freebsd.org
Subject:   Re: SCSI disk MEDIUM ERROR with a few twists
Message-ID:  <Mutt.19970201160346.j@uriah.heep.sax.de>
In-Reply-To: <199702011424.GAA28908@salsa.gv.tsc.tdk.com>; from Don Lewis on Feb 1, 1997 06:24:59 -0800
References:  <199702011424.GAA28908@salsa.gv.tsc.tdk.com>

next in thread | previous in thread | raw e-mail | index | archive | help
As Don Lewis wrote:

> } It could be the drive itself.
> 
> The MEDIUM ERROR itself and the falling offline a week or so later
> are definitely the fault of the drive.  That the error wasn't reported
> to userland lies somewhere between the driver and userland, inclusive.

See my other mail.  For buffered (filesystem) writes, it's no
surprise.  Reads should, however, always report it.

> Jan 18 04:30:33 news /kernel: sd0(ahc0:0:0): MEDIUM ERROR info:14683a asc:11,0 Unrecovered read error field replaceable unit: ea sks:80,11

> Always the same info:#.

Which means: always the same block # (in hex).

> I also can't quote messages from it's death throes before it wedged,
> because this disk also contains /var and nothing was syslogged until
> after I got the machine running multi-user again.  I *think* the message
> was: "Logical unit is in process of becoming ready", but if so it was
> lying.

Btw., you should no longer see this error message now.  This case is
retried forever, until it either turns into a `real' error, or
eventually succeeds.

> It gave me at least two weeks warning last time.  If it gets sick again,
> then I can at least file a more complete report ;-)  Are there any
> experiments you want me to try?

Well, you could see why the read error isn't reported to userland
then. :-)

> } Also, go through SCSI reformatting it.  This will cause the drive to
> } recreate the bad sector table as necessary.  You can even do this
> } without using the adapter BIOS, there's always /sbin/scsiformat for
> } this.
> 
> The painful part is that this is the root disk, and I'm pretty sure the
> 2.1.x fixit disk doesn't contain scsiformat.

scsiformat is simple:

	scsi -s 7200 -f /dev/rsdX.ctl -c "4 0 0 0 0 0"

(Put it into background if you prefer, once started, you can't break
it with ^Z.)

>  Doesn't remapping the sector
> add the original to the drive's grown defect list?

Yes, but reformatting does IMHO often a more complete check, so if an
adjacent sector is flakey, it will more likely be put there as well.

We need a remapping tool as well.  Anybody here who ever dealt with
defect list management?  Since we do already know the block number
(from the info field in the syslog message), it should be easy to add
it to the defect list.

-- 
cheers, J"org

joerg_wunsch@uriah.heep.sax.de -- http://www.sax.de/~joerg/ -- NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Mutt.19970201160346.j>