Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Dec 2012 15:43:09 -0700
From:      Jim Harris <jim.harris@gmail.com>
To:        Willem Jan Withagen <wjw@digiware.nl>
Cc:        FreeBSD Stable Users <freebsd-stable@freebsd.org>
Subject:   Re: Strange CAM errors
Message-ID:  <CAJP=Hc_8fdOH_%2B2GbTsSoNY%2BqTc61qp2kRsfUd3DNTxFQ9mu4w@mail.gmail.com>
In-Reply-To: <50CF9ADD.7080202@digiware.nl>
References:  <50CEFAC5.8000002@digiware.nl> <572946ED30FA47C69D6DCDD511CF6EB2@multiplay.co.uk> <50CF47A5.4090008@digiware.nl> <CAJP=Hc9q50qe4tXxmek_ZD6j=1CNQFwiO9XxtniOLdHZz6gWxw@mail.gmail.com> <50CF925C.5040106@digiware.nl> <CAJP=Hc-4mjWON5=Qi=WVzZ_wzGzz06MjX6w9S5t=xFfyAQ7jbA@mail.gmail.com> <50CF9ADD.7080202@digiware.nl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Dec 17, 2012 at 3:21 PM, Willem Jan Withagen <wjw@digiware.nl>wrote:

> On 17-12-2012 23:10, Jim Harris wrote:
> >
> >
> > On Mon, Dec 17, 2012 at 2:45 PM, Willem Jan Withagen <wjw@digiware.nl
> > <mailto:wjw@digiware.nl>> wrote:
> >
> >     On 17-12-2012 20:16, Jim Harris wrote:>
> >     > The timeouts are occurring on inquiry commands to non-zero LUNs.
> >     > arcmsr(4) is returning CAM_SEL_TIMEOUT instead of
> >     CAM_DEV_NOT_THERE for
> >     > inquiry commands to this device and LUN > 0.  CAM_DEV_NOT_THERE is
> >     > preferred to remove these types of warnings, and similar patches
> have
> >     > gone into for other SCSI drivers recently.
> >     >
> >     > Can you try this patch?
> >     >
> >     > Index: sys/dev/arcmsr/arcmsr.c
> >     > ===================================================================
> >     > --- sys/dev/arcmsr/arcmsr.c     (revision 244190)
> >     > +++ sys/dev/arcmsr/arcmsr.c     (working copy)
> >     > @@ -2439,7 +2439,7 @@
> >     >                 char *buffer=pccb->csio.data_ptr;
> >     >
> >     >                 if (pccb->ccb_h.target_lun) {
> >     > -                       pccb->ccb_h.status |= CAM_SEL_TIMEOUT;
> >     > +                       pccb->ccb_h.status |= CAM_DEV_NOT_THERE;
> >     >                         xpt_done(pccb);
> >     >                         return;
> >     >                 }
> >     >
> >
> >     Hi Jim,
> >
> >     The noise has gone down by a factor of 5, now I get:
> >
> >     (probe6:arcmsr0:0:16:1): INQUIRY. CDB: 12 20 0 0 24 0
> >     (probe6:arcmsr0:0:16:1): CAM status: Unable to terminate I/O CCB
> request
> >     (probe6:arcmsr0:0:16:1): Error 5, Unretryable error
> >     (probe6:arcmsr0:0:16:2): INQUIRY. CDB: 12 40 0 0 24 0
> >
> >     Which is defined in sys/cam/cam.c ....
> >     as CAM_UA_TERMIO, but that error is nowhere set in the arcmsr
> code....
> >
> >
> > There is something out of sync on your system.  I just noticed this, but
> > your original error messages were showing "Command timeout"
> > (CAM_CMD_TIMEOUT) even though the driver was returning CAM_SEL_TIMEOUT.
> > Now in this case, driver is returning CAM_DEV_NOT_THERE, but CAM is
> > printing error message for CAM_UA_TERMIO.  In both cases, driver is
> > returning value X, but cam is interpreting it as X+1.  So CAM and
> > arcmsr(4) seem to have a different idea of the values of the cam_status
> > enumeration.
> >
> > Can you provide details on your build environment?  Are you building
> > arcmsr as a loadable module or do you specify "device arcmsr" in your
> > kernel config to link it statically?  I'm suspecting loadable module,
> > although I have no idea how these values would get out of sync since
> > this enumeration hasn't changed in probably 10+ years.
>
> arcmsr is build in the kernel
>
> [/usr/src] wjw@zfs.digiware.nl> kldstat
> Id Refs Address            Size     Name
>  1   28 0xffffffff80200000 b55be0   kernel
>  2    1 0xffffffff80d56000 6138     nullfs.ko
>  3    1 0xffffffff80d5d000 2153b0   zfs.ko
>  4    2 0xffffffff80f73000 5e38     opensolaris.ko
>  5    1 0xffffffff80f79000 f510     aio.ko
>  6    1 0xffffffff80f89000 2a20     coretemp.ko
>  7    1 0xffffffff81012000 316d4    nfscl.ko
>  8    2 0xffffffff81044000 10827    nfscommon.ko
>
> And I just refetched 9.1-PRERELEASE this afternoon over svn....
>
> Could this have something to do with Clang <> gcc ????
> Not that I did anything to change this.
>
> Note that I have nothing changed other than the KERNEL CONFIG file.
>
> And both kernel and world were build at the same time this afternoon.
> With your patch I just only rebuild kernel and modules.
>
>
Never mind my earlier comment on out-of-sync.  It's another bug in
arcmsr(4) - CAM_REQ_CMP == 0x1, and in the LUN > 0 case here it OR's the
status values together, causing the off-by-one issue we were seeing.

Please try the following patch instead (reverting earlier patch):

Index: sys/dev/arcmsr/arcmsr.c
===================================================================
--- sys/dev/arcmsr/arcmsr.c     (revision 244190)
+++ sys/dev/arcmsr/arcmsr.c     (working copy)
@@ -2432,14 +2432,13 @@
 static void arcmsr_handle_virtual_command(struct AdapterControlBlock *acb,
                union ccb * pccb)
 {
-       pccb->ccb_h.status |= CAM_REQ_CMP;
        switch (pccb->csio.cdb_io.cdb_bytes[0]) {
        case INQUIRY: {
                unsigned char inqdata[36];
                char *buffer=pccb->csio.data_ptr;

                if (pccb->ccb_h.target_lun) {
-                       pccb->ccb_h.status |= CAM_SEL_TIMEOUT;
+                       pccb->ccb_h.status |= CAM_DEV_NOT_THERE;
                        xpt_done(pccb);
                        return;
                }
@@ -2455,6 +2454,7 @@
                strncpy(&inqdata[16], "RAID controller ", 16);  /* Product
Identification */
                strncpy(&inqdata[32], "R001", 4); /* Product Revision */
                memcpy(buffer, inqdata, sizeof(inqdata));
+               pccb->ccb_h.status |= CAM_REQ_CMP;
                xpt_done(pccb);
        }
        break;
@@ -2464,10 +2464,12 @@
                        pccb->ccb_h.status |= CAM_SCSI_STATUS_ERROR;
                        pccb->csio.scsi_status = SCSI_STATUS_CHECK_COND;
                }
+               pccb->ccb_h.status |= CAM_REQ_CMP;
                xpt_done(pccb);
        }
        break;
        default:
+               pccb->ccb_h.status |= CAM_REQ_CMP;
                xpt_done(pccb);
        }
 }



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJP=Hc_8fdOH_%2B2GbTsSoNY%2BqTc61qp2kRsfUd3DNTxFQ9mu4w>