Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 3 Mar 2016 10:09:37 -0700
From:      Scott Long <scott4long@yahoo.com>
To:        Borja Marcos <borjam@sarenet.es>
Cc:        Steven Hartland <killing@multiplay.co.uk>, FreeBSD-scsi <freebsd-scsi@freebsd.org>
Subject:   Re: mpr(4) SAS3008 Repeated Crashing
Message-ID:  <F5E05621-FF84-4BED-B1A7-3252715CD53B@yahoo.com>
In-Reply-To: <F9B68610-12C6-4D32-88CA-A34A185F9AD1@sarenet.es>
References:  <56D5FDB8.8040402@freebsd.org> <56D612FA.6090909@multiplay.co.uk> <A8859ECA-0B58-42A8-AA49-DF6AA3D52CC6@sarenet.es> <E74F5225-1EA8-4B60-ADDC-7B13E1003184@yahoo.com> <D7E0BCCE-EB44-4EF9-8F17-474C162F7D7C@sarenet.es> <56D805FD.50500@multiplay.co.uk> <F9B68610-12C6-4D32-88CA-A34A185F9AD1@sarenet.es>

next in thread | previous in thread | raw e-mail | index | archive | help

> On Mar 3, 2016, at 3:26 AM, Borja Marcos <borjam@sarenet.es> wrote:
>=20
>=20
>> On 03 Mar 2016, at 10:38, Steven Hartland <killing@multiplay.co.uk> =
wrote:
>>=20
>> We've seen HW issues before where the first thing to start triggering =
the problem was TRIM requests, it seems like its an afterthought in most =
FW's unfortunately, so one of the first things to go bad. I'm not saying =
this is you issue, but its something to keep in mind.
>=20
> Thanks :)
>=20
> Not trim related, it seems. I=E2=80=99ve ran the tests with =
kern.cam.X.delete_method set to DISABLE and I still see errors:
>=20
> In paranormal cases like this it would be awesome to have access to a =
logic analyzer=E2=80=A6 I keep dreaming of course :)
>=20
>=20
>=20
> Mar  3 11:12:53 clientes-ssd8 kernel: (noperiph:mpr0:0:4294967295:0): =
SMID 2 Aborting command 0xfffffe0000c7cab0
> Mar  3 11:12:54 clientes-ssd8 kernel: (da2:mpr0:0:26:0): READ(10). =
CDB: 28 00 26 d7 d0 98 00 00 20 00 length 16384 SMID 322 terminated ioc =
804b scsi 0 state c xfer 0
> Mar  3 11:12:54 clientes-ssd8 kernel: (da2:mpr0:0:26:0): READ(10). =
CDB: 28 00 26 d7 d0 b8 00 00 18 00 length 12288 SMID 217 terminated ioc =
804b scsi 0 state c xfer 0
> Mar  3 11:12:54 clientes-ssd8 kernel: (da2:mpr0:0:26:0): SYNCHRONIZE =
CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 205 =
terminated ioc 804b scsi 0 sta(da2:mpr0:0:26:0): READ(10). CDB: 28 00 26 =
d7 d0 18 00 00 78 00=20
> Mar  3 11:12:54 clientes-ssd8 kernel: te c xfer 0
> Mar  3 11:12:54 clientes-ssd8 kernel: (da2:mpr0:0:26:0): CAM status: =
Command timeout
> Mar  3 11:12:54 clientes-ssd8 kernel: (da2:mpr0:0:26:0): Retrying =
command
> Mar  3 11:12:54 clientes-ssd8 kernel: (da2:mpr0:0:26:0): SYNCHRONIZE =
CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00=20
> Mar  3 11:12:54 clientes-ssd8 kernel: (da2:mpr0:0:26:0): CAM status: =
SCSI Status Error
> Mar  3 11:12:54 clientes-ssd8 kernel: (da2:mpr0:0:26:0): SCSI status: =
Check Condition
> Mar  3 11:12:54 clientes-ssd8 kernel: (da2:mpr0:0:26:0): SCSI sense: =
UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> Mar  3 11:12:54 clientes-ssd8 kernel: (da2:mpr0:0:26:0): Retrying =
command (per sense data)
>=20

SYNC CACHE seems to have been involved this time, and while it=E2=80=99s =
sometimes a source of trouble with SATA disks, I=E2=80=99m very hesitant =
to blame it.  Given the seemingly random nature of your problems, I=E2=80=99=
m not as certain anymore to rule out a fault of the disk enclosure.  =
This looks to be a different disk than your last report, and your =
statement that a sibling system exhibits no problems is very =
interesting.  Maybe there=E2=80=99s an issue with the power supply, and =
the disks are getting under-voltage conditions periodically.  If you can =
run smartctl against the disks, the output might be useful.  Also, if =
you=E2=80=99re able, could you make sure that both this system and the =
one that is working well are being fed with sufficient and similar AC =
power?  And if the power supply modules in your enclosures are =
swappable, maybe swap them between systems and see if the problem =
follows the module?  If that doesn=E2=80=99t fix it then I=E2=80=99ll =
think of ways to provide more instrumentation.

Scott




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F5E05621-FF84-4BED-B1A7-3252715CD53B>