Date: Wed, 6 Feb 2019 16:18:37 +0100 From: Borja Marcos <borjam@sarenet.es> To: Karl Denninger <karl@denninger.net> Cc: freebsd-stable@freebsd.org Subject: Re: 9211 (LSI/SAS) issues on 11.2-STABLE Message-ID: <1FFC1686-E70F-4649-B170-34F90B773918@sarenet.es> In-Reply-To: <57ddc2f4-681c-e0aa-0484-42cee3876a05@denninger.net> References: <7bb25f55-fa77-f67e-11f3-b2240b01e25a@denninger.net> <b50c527c-e7f7-3e64-af3a-e597ec77c021@denninger.net> <9ea70420-0c06-ad9d-e8b7-f9d92fed20d8@denninger.net> <57ddc2f4-681c-e0aa-0484-42cee3876a05@denninger.net>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 5 Feb 2019, at 23:49, Karl Denninger <karl@denninger.net> wrote: >=20 > BTW under 12.0-STABLE (built this afternoon after the advisories came > out, with the patches) it's MUCH worse. I get the same device resets > BUT it's followed by an immediate panic which I cannot dump as it > generates a page-fault (supervisor read data, page not present) in the > mps *driver* at mpssas_send_abort+0x21. > This precludes a dump of course since attempting to do so gives you a > double-panic (I was wondering why I didn't get a crash dump!); I'll > re-jigger the box to stick a dump device on an internal SATA device so = I > can successfully get the dump when it happens and see if I can obtain = a > proper crash dump on this. >=20 > I think it's fair to assume that 12.0-STABLE should not panic on a = disk > problem (unless of course the problem is trying to page something back > in -- it's not, the drive that aborts and resets is on a data pack = doing > a scrub) It shouldn=E2=80=99t panic I imagine. >>>> mps0: Sending reset from mpssas_send_abort for target ID 37 >> 0x06 =3D=3D=3D=3D=3D =3D =3D =3D=3D=3D =3D=3D = Transport Statistics (rev 1) =3D=3D >> 0x06 0x008 4 6 --- Number of Hardware Resets >> 0x06 0x010 4 0 --- Number of ASR Events >> 0x06 0x018 4 0 --- Number of Interface CRC Errors >> |||_ C monitored condition met >> ||__ D supports DSN >> |___ N normalized value >>=20 >> 0x06 0x008 4 7 --- Number of Hardware Resets >> 0x06 0x010 4 0 --- Number of ASR Events >> 0x06 0x018 4 0 --- Number of Interface CRC Errors >> |||_ C monitored condition met >> ||__ D supports DSN >> |___ N normalized value >>=20 >> Number of Hardware Resets has incremented. There are no other errors = shown: What is _exactly_ that value? Is it related to the number of resets sent = from the HBA _or_ the device resetting by itself? >> I'd throw possible shade at the backplane or cable /but I have = already >> swapped both out for spares without any change in behavior./ What about the power supply?=20 Borja.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1FFC1686-E70F-4649-B170-34F90B773918>