From owner-freebsd-scsi  Thu Jun  6 04:42:45 1996
Return-Path: owner-freebsd-scsi
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id EAA20229
          for freebsd-scsi-outgoing; Thu, 6 Jun 1996 04:42:45 -0700 (PDT)
Received: from hda (ip86-max1-fitch.zipnet.net [199.232.245.86])
          by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id EAA20208;
          Thu, 6 Jun 1996 04:42:39 -0700 (PDT)
Received: (from dufault@localhost) by hda (8.6.11/8.6.9) id HAA17307; Thu, 6 Jun 1996 07:46:38 -0400
From: Peter Dufault <dufault@hda>
Message-Id: <199606061146.HAA17307@hda>
Subject: Re: ERROR info:747d9d asc:11,0 Unrecovered read error, other SCSI issues
To: dror@hopf.dnai.com
Date: Thu, 6 Jun 1996 07:46:36 -0400 (EDT)
Cc: freebsd-scsi@freebsd.org, freebsd-isp@freebsd.org
In-Reply-To: <Pine.NEB.3.93.960605172234.999K-100000@mars.dnai.com> from "Dror Matalon" at Jun 5, 96 08:45:26 pm
Reply-to: hdalog@zipnet.net
X-Mailer: ELM [version 2.4 PL24]
Content-Type: text
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

> 
> Hi Folks,
> 
> We're using an NCR 53c825 wide scsi controller with 3 
> Quantum XP34300W 4.3 Fast Wide SCSI drives on a pentium 133
> as our news machine.
> 
> We started getting the messages:
> sd0(ncr0:0:0): MEDIUM ERROR info:7476a9 asc:11,0 Unrecovered read error
> , retries:4
> sd0(ncr0:0:0): MEDIUM ERROR info:7476a9 asc:11,0 Unrecovered read error
> , retries:3
> ... 
> 

This indicates that the drive can't read those blocks.  Even if
you have automatic read and write reallocation enabled the drive
won't reallocate on a read failure.  You should:

1. Check that you do have AWRE and ARRE on in mode page 1 - see
how to use the mode page editor in scsi(8) to check and change
this;

2. Once these are on, you can map out the block by writing anything
to that block.  You will change the data on the disk.  Alternatively,
this UNTESTED script should map it out (sorry, I don't have any
disks with bad blocks on line):

++Start of scsiremap:

#!/bin/sh
PATH="/sbin:/usr/sbin:/bin:/usr/bin"; export PATH

RAW=

usage()
{
	echo "Usage: scsiremap raw-device-name block" 1>&2
	exit 2
}

shift $(($OPTIND - 1))

if [ $# -ne 2 ] ; then
	usage
fi

RAW=$1
BLOCK=$2

if [ "x$RAW" = "x" ] ; then
	usage
fi

if expr "$RAW" : 'sd[0-9][0-9]*$' > /dev/null ; then
	# generic disk name given, convert to control device name
	RAW="/dev/r${RAW}.ctl"
fi

scsi -f $RAW -c "7 0 0 0 0 0" -o 8 "0 0 4:i2 v:i4" $BLOCK

--End scsiremap.

(block can be either decimal or hex if preceded by 0x)

This isn't a general utility because I don't know that this is the
right thing to do.  What does the drive do for that data it can't
read?  It doesn't say what in the SCSI spec.  Is it better to turn
off ECC, read as much as you can from the block, then write it back
forcing the slip? Do you want to restore from backups?  Etc.

> These messages always complain about 2 addresses 7476a9 and 747d9d.
> We suspect that these are on the swap area of the first disk.
> 
> 1. Is there a way to take care of this problem other than swaping
> out the disk? Could we somehow mark these areas as bad?
> 
> 2. My number 1 frustration with FreeBsd/Unix on a PC is related to
> the number of SCSI errors that we're running into. We might be at
> fault for not running the computer room  cool enough, and the PC
> type of SCSI connectors/cables might also be the problem -- we only
> recently started using Granite's custom made SCSI cables. Still
> we've had close to 20% mortality rate on our SCSI disks. Are other
> people also experiencing these kind or problems?

If you are swapping out disks because you are developing read errors
then:

You have to figure out why you are developing these read errors.

An OS needs a well thought out policy for handling developed read
errors to hide this from the user.

-- 
Peter Dufault               Real-Time Machine Control and Simulation
HD Associates, Inc.         Voice: 508 433 6936
dufault@hda.com             Fax:   508 433 5267