Date: Mon, 17 Dec 2012 22:45:00 +0100 From: Willem Jan Withagen <wjw@digiware.nl> To: Jim Harris <jimharris@freebsd.org> Cc: FreeBSD Stable Users <freebsd-stable@freebsd.org>, Steven Hartland <killing@multiplay.co.uk> Subject: Re: Strange CAM errors Message-ID: <50CF925C.5040106@digiware.nl> In-Reply-To: <CAJP=Hc9q50qe4tXxmek_ZD6j=1CNQFwiO9XxtniOLdHZz6gWxw@mail.gmail.com> References: <50CEFAC5.8000002@digiware.nl> <572946ED30FA47C69D6DCDD511CF6EB2@multiplay.co.uk> <50CF47A5.4090008@digiware.nl> <CAJP=Hc9q50qe4tXxmek_ZD6j=1CNQFwiO9XxtniOLdHZz6gWxw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 17-12-2012 20:16, Jim Harris wrote: > > > On Mon, Dec 17, 2012 at 9:26 AM, Willem Jan Withagen <wjw@digiware.nl > <mailto:wjw@digiware.nl>> wrote: > > On 2012-12-17 15:38, Steven Hartland wrote: > > Check the smart results of each disk in the array you may have a > failing > > disk. > > ----- Original Message ----- From: "Willem Jan Withagen" > <wjw@digiware.nl <mailto:wjw@digiware.nl>> > > To: "FreeBSD Stable Users" <freebsd-stable@freebsd.org > <mailto:freebsd-stable@freebsd.org>> > > Sent: Monday, December 17, 2012 10:58 AM > > Subject: Strange CAM errors > > > > > >> Hi, > >> > >> I have not noticed this before, but my system rebooted this > morning and > >> in the following security report I found a lot of messgaes in the > >> dmesg-part like: > >> > >> +(probe0:arcmsr0:0:16:1): INQUIRY. CDB: 12 20 0 0 24 0 > >> +(probe0:arcmsr0:0:16:1): CAM status: Command timeout > >> +(probe0:arcmsr0:0:16:1): Retrying command > >> +(probe0:arcmsr0:0:16:1): INQUIRY. CDB: 12 20 0 0 24 0 > >> +(probe0:arcmsr0:0:16:1): CAM status: Command timeout > >> +(probe0:arcmsr0:0:16:1): Retrying command > >> > >> And it seems that bus 16 is: > >> +pass6 at arcmsr0 bus 0 scbus0 target 16 lun 0 > >> +pass6: <Areca RAID controller R001> Fixed Processor SCSI-0 device > >> > >> The system has been running > >> FreeBSD zfs.digiware.nl <http://zfs.digiware.nl> 9.1-PRERELEASE > FreeBSD 9.1-PRERELEASE #3: Wed > >> Nov 14 13:25:55 CET 2012 > >> root@zfs.digiware.nl:/usr/obj/usr/srcs/src9/src/sys/ZFS amd64 > >> for already a while. > >> > >> Anybody suggestions as to why I have these messages? > >> > >> They are during the boot sequence, so no smartd talking to the > disks at > >> that moment. > >> > >> --WjW > >> > >> ps: dmesg, config, etc.... at: > > >> http://www.tegenbosch28.nl/FreeBSD/Systems/ZFS > >> ps2: upgrading to the most recent 9.1 > > 'mmm, > > Smartd seems to think otherwise... > > 'camcontrol rescan all' actually delivers the same pack of errors. > > --WjW > > > The timeouts are occurring on inquiry commands to non-zero LUNs. > arcmsr(4) is returning CAM_SEL_TIMEOUT instead of CAM_DEV_NOT_THERE for > inquiry commands to this device and LUN > 0. CAM_DEV_NOT_THERE is > preferred to remove these types of warnings, and similar patches have > gone into for other SCSI drivers recently. > > Can you try this patch? > > Index: sys/dev/arcmsr/arcmsr.c > =================================================================== > --- sys/dev/arcmsr/arcmsr.c (revision 244190) > +++ sys/dev/arcmsr/arcmsr.c (working copy) > @@ -2439,7 +2439,7 @@ > char *buffer=pccb->csio.data_ptr; > > if (pccb->ccb_h.target_lun) { > - pccb->ccb_h.status |= CAM_SEL_TIMEOUT; > + pccb->ccb_h.status |= CAM_DEV_NOT_THERE; > xpt_done(pccb); > return; > } > Hi Jim, The noise has gone down by a factor of 5, now I get: (probe6:arcmsr0:0:16:1): INQUIRY. CDB: 12 20 0 0 24 0 (probe6:arcmsr0:0:16:1): CAM status: Unable to terminate I/O CCB request (probe6:arcmsr0:0:16:1): Error 5, Unretryable error (probe6:arcmsr0:0:16:2): INQUIRY. CDB: 12 40 0 0 24 0 Which is defined in sys/cam/cam.c .... as CAM_UA_TERMIO, but that error is nowhere set in the arcmsr code.... So I clearly do not yet know enough to hellp in this. --WjW For all of the ports on the adapter.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50CF925C.5040106>