From owner-freebsd-stable@FreeBSD.ORG Mon Dec 17 23:52:10 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B352CDC for ; Mon, 17 Dec 2012 23:52:10 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from mail.digiware.nl (unknown [IPv6:2001:4cb8:90:ffff::3]) by mx1.freebsd.org (Postfix) with ESMTP id 1523B8FC18 for ; Mon, 17 Dec 2012 23:52:09 +0000 (UTC) Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1]) by mail.digiware.nl (Postfix) with ESMTP id 05607153435; Tue, 18 Dec 2012 00:52:09 +0100 (CET) X-Virus-Scanned: amavisd-new at digiware.nl Received: from mail.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id N2lBUxtQC4jl; Tue, 18 Dec 2012 00:52:07 +0100 (CET) Received: from [192.168.10.10] (vaio [192.168.10.10]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.digiware.nl (Postfix) with ESMTPSA id F357A153434; Tue, 18 Dec 2012 00:52:06 +0100 (CET) Message-ID: <50CFB026.3010102@digiware.nl> Date: Tue, 18 Dec 2012 00:52:06 +0100 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121026 Thunderbird/16.0.2 MIME-Version: 1.0 To: Jim Harris Subject: Re: Strange CAM errors References: <50CEFAC5.8000002@digiware.nl> <572946ED30FA47C69D6DCDD511CF6EB2@multiplay.co.uk> <50CF47A5.4090008@digiware.nl> <50CF925C.5040106@digiware.nl> <50CF9ADD.7080202@digiware.nl> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: FreeBSD Stable Users X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2012 23:52:10 -0000 On 17-12-2012 23:43, Jim Harris wrote: > On Mon, Dec 17, 2012 at 3:21 PM, Willem Jan Withagen wrote: > >> On 17-12-2012 23:10, Jim Harris wrote: >>> >>> >>> On Mon, Dec 17, 2012 at 2:45 PM, Willem Jan Withagen >> > wrote: >>> >>> On 17-12-2012 20:16, Jim Harris wrote:> >>> > The timeouts are occurring on inquiry commands to non-zero LUNs. >>> > arcmsr(4) is returning CAM_SEL_TIMEOUT instead of >>> CAM_DEV_NOT_THERE for >>> > inquiry commands to this device and LUN > 0. CAM_DEV_NOT_THERE is >>> > preferred to remove these types of warnings, and similar patches >> have >>> > gone into for other SCSI drivers recently. >>> > >>> > Can you try this patch? >>> > >>> > Index: sys/dev/arcmsr/arcmsr.c >>> > =================================================================== >>> > --- sys/dev/arcmsr/arcmsr.c (revision 244190) >>> > +++ sys/dev/arcmsr/arcmsr.c (working copy) >>> > @@ -2439,7 +2439,7 @@ >>> > char *buffer=pccb->csio.data_ptr; >>> > >>> > if (pccb->ccb_h.target_lun) { >>> > - pccb->ccb_h.status |= CAM_SEL_TIMEOUT; >>> > + pccb->ccb_h.status |= CAM_DEV_NOT_THERE; >>> > xpt_done(pccb); >>> > return; >>> > } >>> > >>> >>> Hi Jim, >>> >>> The noise has gone down by a factor of 5, now I get: >>> >>> (probe6:arcmsr0:0:16:1): INQUIRY. CDB: 12 20 0 0 24 0 >>> (probe6:arcmsr0:0:16:1): CAM status: Unable to terminate I/O CCB >> request >>> (probe6:arcmsr0:0:16:1): Error 5, Unretryable error >>> (probe6:arcmsr0:0:16:2): INQUIRY. CDB: 12 40 0 0 24 0 >>> >>> Which is defined in sys/cam/cam.c .... >>> as CAM_UA_TERMIO, but that error is nowhere set in the arcmsr >> code.... >>> >>> >>> There is something out of sync on your system. I just noticed this, but >>> your original error messages were showing "Command timeout" >>> (CAM_CMD_TIMEOUT) even though the driver was returning CAM_SEL_TIMEOUT. >>> Now in this case, driver is returning CAM_DEV_NOT_THERE, but CAM is >>> printing error message for CAM_UA_TERMIO. In both cases, driver is >>> returning value X, but cam is interpreting it as X+1. So CAM and >>> arcmsr(4) seem to have a different idea of the values of the cam_status >>> enumeration. >>> >>> Can you provide details on your build environment? Are you building >>> arcmsr as a loadable module or do you specify "device arcmsr" in your >>> kernel config to link it statically? I'm suspecting loadable module, >>> although I have no idea how these values would get out of sync since >>> this enumeration hasn't changed in probably 10+ years. >> >> arcmsr is build in the kernel >> >> [/usr/src] wjw@zfs.digiware.nl> kldstat >> Id Refs Address Size Name >> 1 28 0xffffffff80200000 b55be0 kernel >> 2 1 0xffffffff80d56000 6138 nullfs.ko >> 3 1 0xffffffff80d5d000 2153b0 zfs.ko >> 4 2 0xffffffff80f73000 5e38 opensolaris.ko >> 5 1 0xffffffff80f79000 f510 aio.ko >> 6 1 0xffffffff80f89000 2a20 coretemp.ko >> 7 1 0xffffffff81012000 316d4 nfscl.ko >> 8 2 0xffffffff81044000 10827 nfscommon.ko >> >> And I just refetched 9.1-PRERELEASE this afternoon over svn.... >> >> Could this have something to do with Clang <> gcc ???? >> Not that I did anything to change this. >> >> Note that I have nothing changed other than the KERNEL CONFIG file. >> >> And both kernel and world were build at the same time this afternoon. >> With your patch I just only rebuild kernel and modules. >> >> > Never mind my earlier comment on out-of-sync. It's another bug in > arcmsr(4) - CAM_REQ_CMP == 0x1, and in the LUN > 0 case here it OR's the > status values together, causing the off-by-one issue we were seeing. > > Please try the following patch instead (reverting earlier patch): > > Index: sys/dev/arcmsr/arcmsr.c > =================================================================== > --- sys/dev/arcmsr/arcmsr.c (revision 244190) > +++ sys/dev/arcmsr/arcmsr.c (working copy) > @@ -2432,14 +2432,13 @@ > static void arcmsr_handle_virtual_command(struct AdapterControlBlock *acb, > union ccb * pccb) > { > - pccb->ccb_h.status |= CAM_REQ_CMP; > switch (pccb->csio.cdb_io.cdb_bytes[0]) { > case INQUIRY: { > unsigned char inqdata[36]; > char *buffer=pccb->csio.data_ptr; > > if (pccb->ccb_h.target_lun) { > - pccb->ccb_h.status |= CAM_SEL_TIMEOUT; > + pccb->ccb_h.status |= CAM_DEV_NOT_THERE; > xpt_done(pccb); > return; > } > @@ -2455,6 +2454,7 @@ > strncpy(&inqdata[16], "RAID controller ", 16); /* Product > Identification */ > strncpy(&inqdata[32], "R001", 4); /* Product Revision */ > memcpy(buffer, inqdata, sizeof(inqdata)); > + pccb->ccb_h.status |= CAM_REQ_CMP; > xpt_done(pccb); > } > break; > @@ -2464,10 +2464,12 @@ > pccb->ccb_h.status |= CAM_SCSI_STATUS_ERROR; > pccb->csio.scsi_status = SCSI_STATUS_CHECK_COND; > } > + pccb->ccb_h.status |= CAM_REQ_CMP; > xpt_done(pccb); > } > break; > default: > + pccb->ccb_h.status |= CAM_REQ_CMP; > xpt_done(pccb); > } > } Right, That did the trick..... Thanx for the code. --WjW