From owner-freebsd-scsi  Wed Jul 21 11: 9:44 1999
Delivered-To: freebsd-scsi@freebsd.org
Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169])
	by hub.freebsd.org (Postfix) with ESMTP id F412814C4C
	for <scsi@FreeBSD.ORG>; Wed, 21 Jul 1999 11:09:37 -0700 (PDT)
	(envelope-from ken@panzer.kdm.org)
Received: (from ken@localhost)
	by panzer.kdm.org (8.9.3/8.9.1) id MAA83601;
	Wed, 21 Jul 1999 12:07:28 -0600 (MDT)
	(envelope-from ken)
Message-Id: <199907211807.MAA83601@panzer.kdm.org>
Subject: Re: error logs
In-Reply-To: <199907211630.JAA00715@dingo.cdrom.com> from Mike Smith at "Jul 21, 1999 09:30:25 am"
To: mike@smith.net.au (Mike Smith)
Date: Wed, 21 Jul 1999 12:07:28 -0600 (MDT)
Cc: asami@cs.berkeley.edu (Satoshi Asami), scsi@FreeBSD.ORG
From: "Kenneth D. Merry" <ken@plutotech.com>
X-Mailer: ELM [version 2.4ME+ PL54 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-scsi@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Mike Smith wrote...
> > Hi,
> > 
> > I have a question.  I just saw some errors on the package building
> > machine.  Part of it looks like this:
> > 
> > ===
> >  :
> > Jul 21 02:25:39 bento /kernel: (da7:ahc1:0:4:0): READ(10). CDB: 28 0 0 3c f8 16 0 0 2 0 
> > Jul 21 02:25:39 bento /kernel: (da7:ahc1:0:4:0): MEDIUM ERROR info:3cf816 asc:11,0
> > Jul 21 02:25:39 bento /kernel: (da7:ahc1:0:4:0): Unrecovered read error sks:80,9
> 
> This is a fatal read error.  The kernel will retry it.

If it gets retried, it gets retried above the CAM layer.  When CAM prints
out an error message, it almost always is after all retries have been
completed.  Read and write commands from the da driver have a retry count
of 4.

> > Jul 21 02:25:40 bento /kernel: (da7:ahc1:0:4:0): READ(10). CDB: 28 0 0 3c f8 16 0 0 2 0 
> > Jul 21 02:25:41 bento /kernel: (da7:ahc1:0:4:0): RECOVERED ERROR info:3cf817 asc:17,2
> > Jul 21 02:25:41 bento /kernel: (da7:ahc1:0:4:0): Recovered data with positive head offset sks:80,2
> >  :
> 
> This is the kernel-instigated retry, note that the read10 command is 
> the same.  The drive reports that it was able to recover the data but 
> needed to adjust the head position in order to do so.

The read command is the same, but the block referred to in this error
message is different than the one above.  See the info field.  The read
cdb above is two blocks in length.

> > ===
> > 
> > I assume the stuff after "CDB:" is the entire SCSI command (10-byte
> > commands?), does this mean that the kernel got a medium error from the
> > disk, retried the exact same read command and succeeded the second
> > time, even though the disk had to do some internal fiddling ("positive
> > head offset")?
> > 
> > I also see a bunch of recovered error messages with no associated
> > medium error messages.  This probably means the disk is dying, right?
> 
> It at least means that it's grown some defects.  What I'm not seeing 
> are any additions to the grown defects list, despite ARRE being set.  8(

Read reallocation only works if the disk managed to salvage the data.  If
it can't salvage the data, it can't reallocate it.  Write reallocation,
IMO, should be successful much more often, because the kernel has the good
data already.

Ken
-- 
Kenneth Merry
ken@plutotech.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message