Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Dec 2012 22:45:00 +0100
From:      Willem Jan Withagen <wjw@digiware.nl>
To:        Jim Harris <jimharris@freebsd.org>
Cc:        FreeBSD Stable Users <freebsd-stable@freebsd.org>, Steven Hartland <killing@multiplay.co.uk>
Subject:   Re: Strange CAM errors
Message-ID:  <50CF925C.5040106@digiware.nl>
In-Reply-To: <CAJP=Hc9q50qe4tXxmek_ZD6j=1CNQFwiO9XxtniOLdHZz6gWxw@mail.gmail.com>
References:  <50CEFAC5.8000002@digiware.nl> <572946ED30FA47C69D6DCDD511CF6EB2@multiplay.co.uk> <50CF47A5.4090008@digiware.nl> <CAJP=Hc9q50qe4tXxmek_ZD6j=1CNQFwiO9XxtniOLdHZz6gWxw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 17-12-2012 20:16, Jim Harris wrote:
> 
> 
> On Mon, Dec 17, 2012 at 9:26 AM, Willem Jan Withagen <wjw@digiware.nl
> <mailto:wjw@digiware.nl>> wrote:
> 
>     On 2012-12-17 15:38, Steven Hartland wrote:
>     > Check the smart results of each disk in the array you may have a
>     failing
>     > disk.
>     > ----- Original Message ----- From: "Willem Jan Withagen"
>     <wjw@digiware.nl <mailto:wjw@digiware.nl>>
>     > To: "FreeBSD Stable Users" <freebsd-stable@freebsd.org
>     <mailto:freebsd-stable@freebsd.org>>
>     > Sent: Monday, December 17, 2012 10:58 AM
>     > Subject: Strange CAM errors
>     >
>     >
>     >> Hi,
>     >>
>     >> I have not noticed this before, but my system rebooted this
>     morning and
>     >> in the following security report I found a lot of messgaes in the
>     >> dmesg-part like:
>     >>
>     >> +(probe0:arcmsr0:0:16:1): INQUIRY. CDB: 12 20 0 0 24 0
>     >> +(probe0:arcmsr0:0:16:1): CAM status: Command timeout
>     >> +(probe0:arcmsr0:0:16:1): Retrying command
>     >> +(probe0:arcmsr0:0:16:1): INQUIRY. CDB: 12 20 0 0 24 0
>     >> +(probe0:arcmsr0:0:16:1): CAM status: Command timeout
>     >> +(probe0:arcmsr0:0:16:1): Retrying command
>     >>
>     >> And it seems that bus 16 is:
>     >> +pass6 at arcmsr0 bus 0 scbus0 target 16 lun 0
>     >> +pass6: <Areca RAID controller R001> Fixed Processor SCSI-0 device
>     >>
>     >> The system has been running
>     >> FreeBSD zfs.digiware.nl <http://zfs.digiware.nl>; 9.1-PRERELEASE
>     FreeBSD 9.1-PRERELEASE #3: Wed
>     >> Nov 14 13:25:55 CET 2012
>     >> root@zfs.digiware.nl:/usr/obj/usr/srcs/src9/src/sys/ZFS  amd64
>     >> for already a while.
>     >>
>     >> Anybody suggestions as to why I have these messages?
>     >>
>     >> They are during the boot sequence, so no smartd talking to the
>     disks at
>     >> that moment.
>     >>
>     >> --WjW
>     >>
>     >> ps: dmesg, config, etc.... at:
> 
>     >> http://www.tegenbosch28.nl/FreeBSD/Systems/ZFS
>     >> ps2: upgrading to the most recent 9.1
> 
>     'mmm,
> 
>     Smartd seems to think otherwise...
> 
>     'camcontrol rescan all' actually delivers the same pack of errors.
> 
>     --WjW
> 
> 
> The timeouts are occurring on inquiry commands to non-zero LUNs. 
> arcmsr(4) is returning CAM_SEL_TIMEOUT instead of CAM_DEV_NOT_THERE for
> inquiry commands to this device and LUN > 0.  CAM_DEV_NOT_THERE is
> preferred to remove these types of warnings, and similar patches have
> gone into for other SCSI drivers recently.
> 
> Can you try this patch?
> 
> Index: sys/dev/arcmsr/arcmsr.c
> ===================================================================
> --- sys/dev/arcmsr/arcmsr.c     (revision 244190)
> +++ sys/dev/arcmsr/arcmsr.c     (working copy)
> @@ -2439,7 +2439,7 @@
>                 char *buffer=pccb->csio.data_ptr;
>  
>                 if (pccb->ccb_h.target_lun) {
> -                       pccb->ccb_h.status |= CAM_SEL_TIMEOUT;
> +                       pccb->ccb_h.status |= CAM_DEV_NOT_THERE;
>                         xpt_done(pccb);
>                         return;
>                 }
> 

Hi Jim,

The noise has gone down by a factor of 5, now I get:

(probe6:arcmsr0:0:16:1): INQUIRY. CDB: 12 20 0 0 24 0
(probe6:arcmsr0:0:16:1): CAM status: Unable to terminate I/O CCB request
(probe6:arcmsr0:0:16:1): Error 5, Unretryable error
(probe6:arcmsr0:0:16:2): INQUIRY. CDB: 12 40 0 0 24 0

Which is defined in sys/cam/cam.c ....
as CAM_UA_TERMIO, but that error is nowhere set in the arcmsr code....

So I clearly do not yet know enough to hellp in this.

--WjW


For all of the ports on the adapter.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50CF925C.5040106>