Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 6 Feb 2019 16:18:37 +0100
From:      Borja Marcos <borjam@sarenet.es>
To:        Karl Denninger <karl@denninger.net>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: 9211 (LSI/SAS) issues on 11.2-STABLE
Message-ID:  <1FFC1686-E70F-4649-B170-34F90B773918@sarenet.es>
In-Reply-To: <57ddc2f4-681c-e0aa-0484-42cee3876a05@denninger.net>
References:  <7bb25f55-fa77-f67e-11f3-b2240b01e25a@denninger.net> <b50c527c-e7f7-3e64-af3a-e597ec77c021@denninger.net> <9ea70420-0c06-ad9d-e8b7-f9d92fed20d8@denninger.net> <57ddc2f4-681c-e0aa-0484-42cee3876a05@denninger.net>

next in thread | previous in thread | raw e-mail | index | archive | help


> On 5 Feb 2019, at 23:49, Karl Denninger <karl@denninger.net> wrote:
>=20
> BTW under 12.0-STABLE (built this afternoon after the advisories came
> out, with the patches) it's MUCH worse.  I get the same device resets
> BUT it's followed by an immediate panic which I cannot dump as it
> generates a page-fault (supervisor read data, page not present) in the
> mps *driver* at mpssas_send_abort+0x21.

> This precludes a dump of course since attempting to do so gives you a
> double-panic (I was wondering why I didn't get a crash dump!); I'll
> re-jigger the box to stick a dump device on an internal SATA device so =
I
> can successfully get the dump when it happens and see if I can obtain =
a
> proper crash dump on this.
>=20
> I think it's fair to assume that 12.0-STABLE should not panic on a =
disk
> problem (unless of course the problem is trying to page something back
> in -- it's not, the drive that aborts and resets is on a data pack =
doing
> a scrub)

It shouldn=E2=80=99t panic I imagine.

>>>> mps0: Sending reset from mpssas_send_abort for target ID 37


>> 0x06  =3D=3D=3D=3D=3D  =3D               =3D  =3D=3D=3D  =3D=3D =
Transport Statistics (rev 1) =3D=3D
>> 0x06  0x008  4               6  ---  Number of Hardware Resets
>> 0x06  0x010  4               0  ---  Number of ASR Events
>> 0x06  0x018  4               0  ---  Number of Interface CRC Errors
>>                                 |||_ C monitored condition met
>>                                 ||__ D supports DSN
>>                                 |___ N normalized value
>>=20
>> 0x06  0x008  4               7  ---  Number of Hardware Resets
>> 0x06  0x010  4               0  ---  Number of ASR Events
>> 0x06  0x018  4               0  ---  Number of Interface CRC Errors
>>                                 |||_ C monitored condition met
>>                                 ||__ D supports DSN
>>                                 |___ N normalized value
>>=20
>> Number of Hardware Resets has incremented.  There are no other errors =
shown:

What is _exactly_ that value? Is it related to the number of resets sent =
from the HBA
_or_ the device resetting by itself?

>> I'd throw possible shade at the backplane or cable /but I have =
already
>> swapped both out for spares without any change in behavior./

What about the power supply?=20





Borja.





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1FFC1686-E70F-4649-B170-34F90B773918>