From owner-freebsd-scsi Thu Jun 6 04:42:45 1996 Return-Path: owner-freebsd-scsi Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id EAA20229 for freebsd-scsi-outgoing; Thu, 6 Jun 1996 04:42:45 -0700 (PDT) Received: from hda (ip86-max1-fitch.zipnet.net [199.232.245.86]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id EAA20208; Thu, 6 Jun 1996 04:42:39 -0700 (PDT) Received: (from dufault@localhost) by hda (8.6.11/8.6.9) id HAA17307; Thu, 6 Jun 1996 07:46:38 -0400 From: Peter Dufault Message-Id: <199606061146.HAA17307@hda> Subject: Re: ERROR info:747d9d asc:11,0 Unrecovered read error, other SCSI issues To: dror@hopf.dnai.com Date: Thu, 6 Jun 1996 07:46:36 -0400 (EDT) Cc: freebsd-scsi@freebsd.org, freebsd-isp@freebsd.org In-Reply-To: from "Dror Matalon" at Jun 5, 96 08:45:26 pm Reply-to: hdalog@zipnet.net X-Mailer: ELM [version 2.4 PL24] Content-Type: text Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > > Hi Folks, > > We're using an NCR 53c825 wide scsi controller with 3 > Quantum XP34300W 4.3 Fast Wide SCSI drives on a pentium 133 > as our news machine. > > We started getting the messages: > sd0(ncr0:0:0): MEDIUM ERROR info:7476a9 asc:11,0 Unrecovered read error > , retries:4 > sd0(ncr0:0:0): MEDIUM ERROR info:7476a9 asc:11,0 Unrecovered read error > , retries:3 > ... > This indicates that the drive can't read those blocks. Even if you have automatic read and write reallocation enabled the drive won't reallocate on a read failure. You should: 1. Check that you do have AWRE and ARRE on in mode page 1 - see how to use the mode page editor in scsi(8) to check and change this; 2. Once these are on, you can map out the block by writing anything to that block. You will change the data on the disk. Alternatively, this UNTESTED script should map it out (sorry, I don't have any disks with bad blocks on line): ++Start of scsiremap: #!/bin/sh PATH="/sbin:/usr/sbin:/bin:/usr/bin"; export PATH RAW= usage() { echo "Usage: scsiremap raw-device-name block" 1>&2 exit 2 } shift $(($OPTIND - 1)) if [ $# -ne 2 ] ; then usage fi RAW=$1 BLOCK=$2 if [ "x$RAW" = "x" ] ; then usage fi if expr "$RAW" : 'sd[0-9][0-9]*$' > /dev/null ; then # generic disk name given, convert to control device name RAW="/dev/r${RAW}.ctl" fi scsi -f $RAW -c "7 0 0 0 0 0" -o 8 "0 0 4:i2 v:i4" $BLOCK --End scsiremap. (block can be either decimal or hex if preceded by 0x) This isn't a general utility because I don't know that this is the right thing to do. What does the drive do for that data it can't read? It doesn't say what in the SCSI spec. Is it better to turn off ECC, read as much as you can from the block, then write it back forcing the slip? Do you want to restore from backups? Etc. > These messages always complain about 2 addresses 7476a9 and 747d9d. > We suspect that these are on the swap area of the first disk. > > 1. Is there a way to take care of this problem other than swaping > out the disk? Could we somehow mark these areas as bad? > > 2. My number 1 frustration with FreeBsd/Unix on a PC is related to > the number of SCSI errors that we're running into. We might be at > fault for not running the computer room cool enough, and the PC > type of SCSI connectors/cables might also be the problem -- we only > recently started using Granite's custom made SCSI cables. Still > we've had close to 20% mortality rate on our SCSI disks. Are other > people also experiencing these kind or problems? If you are swapping out disks because you are developing read errors then: You have to figure out why you are developing these read errors. An OS needs a well thought out policy for handling developed read errors to hide this from the user. -- Peter Dufault Real-Time Machine Control and Simulation HD Associates, Inc. Voice: 508 433 6936 dufault@hda.com Fax: 508 433 5267