From owner-freebsd-scsi Wed Jul 21 11: 9:44 1999 Delivered-To: freebsd-scsi@freebsd.org Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169]) by hub.freebsd.org (Postfix) with ESMTP id F412814C4C for ; Wed, 21 Jul 1999 11:09:37 -0700 (PDT) (envelope-from ken@panzer.kdm.org) Received: (from ken@localhost) by panzer.kdm.org (8.9.3/8.9.1) id MAA83601; Wed, 21 Jul 1999 12:07:28 -0600 (MDT) (envelope-from ken) Message-Id: <199907211807.MAA83601@panzer.kdm.org> Subject: Re: error logs In-Reply-To: <199907211630.JAA00715@dingo.cdrom.com> from Mike Smith at "Jul 21, 1999 09:30:25 am" To: mike@smith.net.au (Mike Smith) Date: Wed, 21 Jul 1999 12:07:28 -0600 (MDT) Cc: asami@cs.berkeley.edu (Satoshi Asami), scsi@FreeBSD.ORG From: "Kenneth D. Merry" X-Mailer: ELM [version 2.4ME+ PL54 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-scsi@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Mike Smith wrote... > > Hi, > > > > I have a question. I just saw some errors on the package building > > machine. Part of it looks like this: > > > > === > > : > > Jul 21 02:25:39 bento /kernel: (da7:ahc1:0:4:0): READ(10). CDB: 28 0 0 3c f8 16 0 0 2 0 > > Jul 21 02:25:39 bento /kernel: (da7:ahc1:0:4:0): MEDIUM ERROR info:3cf816 asc:11,0 > > Jul 21 02:25:39 bento /kernel: (da7:ahc1:0:4:0): Unrecovered read error sks:80,9 > > This is a fatal read error. The kernel will retry it. If it gets retried, it gets retried above the CAM layer. When CAM prints out an error message, it almost always is after all retries have been completed. Read and write commands from the da driver have a retry count of 4. > > Jul 21 02:25:40 bento /kernel: (da7:ahc1:0:4:0): READ(10). CDB: 28 0 0 3c f8 16 0 0 2 0 > > Jul 21 02:25:41 bento /kernel: (da7:ahc1:0:4:0): RECOVERED ERROR info:3cf817 asc:17,2 > > Jul 21 02:25:41 bento /kernel: (da7:ahc1:0:4:0): Recovered data with positive head offset sks:80,2 > > : > > This is the kernel-instigated retry, note that the read10 command is > the same. The drive reports that it was able to recover the data but > needed to adjust the head position in order to do so. The read command is the same, but the block referred to in this error message is different than the one above. See the info field. The read cdb above is two blocks in length. > > === > > > > I assume the stuff after "CDB:" is the entire SCSI command (10-byte > > commands?), does this mean that the kernel got a medium error from the > > disk, retried the exact same read command and succeeded the second > > time, even though the disk had to do some internal fiddling ("positive > > head offset")? > > > > I also see a bunch of recovered error messages with no associated > > medium error messages. This probably means the disk is dying, right? > > It at least means that it's grown some defects. What I'm not seeing > are any additions to the grown defects list, despite ARRE being set. 8( Read reallocation only works if the disk managed to salvage the data. If it can't salvage the data, it can't reallocate it. Write reallocation, IMO, should be successful much more often, because the kernel has the good data already. Ken -- Kenneth Merry ken@plutotech.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message