From owner-freebsd-stable@FreeBSD.ORG Mon Dec 17 22:43:12 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 68873FC2 for ; Mon, 17 Dec 2012 22:43:12 +0000 (UTC) (envelope-from jim.harris@gmail.com) Received: from mail-wg0-f52.google.com (mail-wg0-f52.google.com [74.125.82.52]) by mx1.freebsd.org (Postfix) with ESMTP id DBF5D8FC13 for ; Mon, 17 Dec 2012 22:43:10 +0000 (UTC) Received: by mail-wg0-f52.google.com with SMTP id 12so2856397wgh.31 for ; Mon, 17 Dec 2012 14:43:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=dgiV7Ivmec/GhJvryx1C9OvX3HtA+ZaKUlhtyOXTZEo=; b=KRSFfRH3qqPzJR2KOso7M7+JXP9v6+swoTQJQdf+CY193rc1kaBV7h6PE7mLw+kDRA qYpcxagqwgtJSSOBFKFVn8rD6l4qR1rc+ZrUWEfr4f1thYZSxqdLobcjG59wTtjjQDHC 7FST29O2JAgHhmp9AiNtTfUTXhv7o8red91Ze2gBtSlxF23e19tA7h7ZdCvAkFCAoWRE UsmTqvrcs+eizHtkiyAi9rw4OC8IkdD7SCghYsfB7M8vEDIx2dEuTpPZT0f8qzSaMVlA 6W0Fs8FUmUyvxBBmxSF4jQ2HIJ40ytNHnsz7ngB+ewC75zMXyW+KoQSvmz+OLZfUqCOa 1YXQ== MIME-Version: 1.0 Received: by 10.194.238.5 with SMTP id vg5mr20089815wjc.40.1355784189782; Mon, 17 Dec 2012 14:43:09 -0800 (PST) Received: by 10.217.57.4 with HTTP; Mon, 17 Dec 2012 14:43:09 -0800 (PST) In-Reply-To: <50CF9ADD.7080202@digiware.nl> References: <50CEFAC5.8000002@digiware.nl> <572946ED30FA47C69D6DCDD511CF6EB2@multiplay.co.uk> <50CF47A5.4090008@digiware.nl> <50CF925C.5040106@digiware.nl> <50CF9ADD.7080202@digiware.nl> Date: Mon, 17 Dec 2012 15:43:09 -0700 Message-ID: Subject: Re: Strange CAM errors From: Jim Harris To: Willem Jan Withagen Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: FreeBSD Stable Users X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2012 22:43:12 -0000 On Mon, Dec 17, 2012 at 3:21 PM, Willem Jan Withagen wrote: > On 17-12-2012 23:10, Jim Harris wrote: > > > > > > On Mon, Dec 17, 2012 at 2:45 PM, Willem Jan Withagen > > wrote: > > > > On 17-12-2012 20:16, Jim Harris wrote:> > > > The timeouts are occurring on inquiry commands to non-zero LUNs. > > > arcmsr(4) is returning CAM_SEL_TIMEOUT instead of > > CAM_DEV_NOT_THERE for > > > inquiry commands to this device and LUN > 0. CAM_DEV_NOT_THERE is > > > preferred to remove these types of warnings, and similar patches > have > > > gone into for other SCSI drivers recently. > > > > > > Can you try this patch? > > > > > > Index: sys/dev/arcmsr/arcmsr.c > > > =================================================================== > > > --- sys/dev/arcmsr/arcmsr.c (revision 244190) > > > +++ sys/dev/arcmsr/arcmsr.c (working copy) > > > @@ -2439,7 +2439,7 @@ > > > char *buffer=pccb->csio.data_ptr; > > > > > > if (pccb->ccb_h.target_lun) { > > > - pccb->ccb_h.status |= CAM_SEL_TIMEOUT; > > > + pccb->ccb_h.status |= CAM_DEV_NOT_THERE; > > > xpt_done(pccb); > > > return; > > > } > > > > > > > Hi Jim, > > > > The noise has gone down by a factor of 5, now I get: > > > > (probe6:arcmsr0:0:16:1): INQUIRY. CDB: 12 20 0 0 24 0 > > (probe6:arcmsr0:0:16:1): CAM status: Unable to terminate I/O CCB > request > > (probe6:arcmsr0:0:16:1): Error 5, Unretryable error > > (probe6:arcmsr0:0:16:2): INQUIRY. CDB: 12 40 0 0 24 0 > > > > Which is defined in sys/cam/cam.c .... > > as CAM_UA_TERMIO, but that error is nowhere set in the arcmsr > code.... > > > > > > There is something out of sync on your system. I just noticed this, but > > your original error messages were showing "Command timeout" > > (CAM_CMD_TIMEOUT) even though the driver was returning CAM_SEL_TIMEOUT. > > Now in this case, driver is returning CAM_DEV_NOT_THERE, but CAM is > > printing error message for CAM_UA_TERMIO. In both cases, driver is > > returning value X, but cam is interpreting it as X+1. So CAM and > > arcmsr(4) seem to have a different idea of the values of the cam_status > > enumeration. > > > > Can you provide details on your build environment? Are you building > > arcmsr as a loadable module or do you specify "device arcmsr" in your > > kernel config to link it statically? I'm suspecting loadable module, > > although I have no idea how these values would get out of sync since > > this enumeration hasn't changed in probably 10+ years. > > arcmsr is build in the kernel > > [/usr/src] wjw@zfs.digiware.nl> kldstat > Id Refs Address Size Name > 1 28 0xffffffff80200000 b55be0 kernel > 2 1 0xffffffff80d56000 6138 nullfs.ko > 3 1 0xffffffff80d5d000 2153b0 zfs.ko > 4 2 0xffffffff80f73000 5e38 opensolaris.ko > 5 1 0xffffffff80f79000 f510 aio.ko > 6 1 0xffffffff80f89000 2a20 coretemp.ko > 7 1 0xffffffff81012000 316d4 nfscl.ko > 8 2 0xffffffff81044000 10827 nfscommon.ko > > And I just refetched 9.1-PRERELEASE this afternoon over svn.... > > Could this have something to do with Clang <> gcc ???? > Not that I did anything to change this. > > Note that I have nothing changed other than the KERNEL CONFIG file. > > And both kernel and world were build at the same time this afternoon. > With your patch I just only rebuild kernel and modules. > > Never mind my earlier comment on out-of-sync. It's another bug in arcmsr(4) - CAM_REQ_CMP == 0x1, and in the LUN > 0 case here it OR's the status values together, causing the off-by-one issue we were seeing. Please try the following patch instead (reverting earlier patch): Index: sys/dev/arcmsr/arcmsr.c =================================================================== --- sys/dev/arcmsr/arcmsr.c (revision 244190) +++ sys/dev/arcmsr/arcmsr.c (working copy) @@ -2432,14 +2432,13 @@ static void arcmsr_handle_virtual_command(struct AdapterControlBlock *acb, union ccb * pccb) { - pccb->ccb_h.status |= CAM_REQ_CMP; switch (pccb->csio.cdb_io.cdb_bytes[0]) { case INQUIRY: { unsigned char inqdata[36]; char *buffer=pccb->csio.data_ptr; if (pccb->ccb_h.target_lun) { - pccb->ccb_h.status |= CAM_SEL_TIMEOUT; + pccb->ccb_h.status |= CAM_DEV_NOT_THERE; xpt_done(pccb); return; } @@ -2455,6 +2454,7 @@ strncpy(&inqdata[16], "RAID controller ", 16); /* Product Identification */ strncpy(&inqdata[32], "R001", 4); /* Product Revision */ memcpy(buffer, inqdata, sizeof(inqdata)); + pccb->ccb_h.status |= CAM_REQ_CMP; xpt_done(pccb); } break; @@ -2464,10 +2464,12 @@ pccb->ccb_h.status |= CAM_SCSI_STATUS_ERROR; pccb->csio.scsi_status = SCSI_STATUS_CHECK_COND; } + pccb->ccb_h.status |= CAM_REQ_CMP; xpt_done(pccb); } break; default: + pccb->ccb_h.status |= CAM_REQ_CMP; xpt_done(pccb); } }