Date: Wed, 6 Feb 2019 16:18:37 +0100 From: Borja Marcos <borjam@sarenet.es> To: Karl Denninger <karl@denninger.net> Cc: freebsd-stable@freebsd.org Subject: Re: 9211 (LSI/SAS) issues on 11.2-STABLE Message-ID: <1FFC1686-E70F-4649-B170-34F90B773918@sarenet.es> In-Reply-To: <57ddc2f4-681c-e0aa-0484-42cee3876a05@denninger.net> References: <7bb25f55-fa77-f67e-11f3-b2240b01e25a@denninger.net> <b50c527c-e7f7-3e64-af3a-e597ec77c021@denninger.net> <9ea70420-0c06-ad9d-e8b7-f9d92fed20d8@denninger.net> <57ddc2f4-681c-e0aa-0484-42cee3876a05@denninger.net>
index | next in thread | previous in thread | raw e-mail
> On 5 Feb 2019, at 23:49, Karl Denninger <karl@denninger.net> wrote: > > BTW under 12.0-STABLE (built this afternoon after the advisories came > out, with the patches) it's MUCH worse. I get the same device resets > BUT it's followed by an immediate panic which I cannot dump as it > generates a page-fault (supervisor read data, page not present) in the > mps *driver* at mpssas_send_abort+0x21. > This precludes a dump of course since attempting to do so gives you a > double-panic (I was wondering why I didn't get a crash dump!); I'll > re-jigger the box to stick a dump device on an internal SATA device so I > can successfully get the dump when it happens and see if I can obtain a > proper crash dump on this. > > I think it's fair to assume that 12.0-STABLE should not panic on a disk > problem (unless of course the problem is trying to page something back > in -- it's not, the drive that aborts and resets is on a data pack doing > a scrub) It shouldn’t panic I imagine. >>>> mps0: Sending reset from mpssas_send_abort for target ID 37 >> 0x06 ===== = = === == Transport Statistics (rev 1) == >> 0x06 0x008 4 6 --- Number of Hardware Resets >> 0x06 0x010 4 0 --- Number of ASR Events >> 0x06 0x018 4 0 --- Number of Interface CRC Errors >> |||_ C monitored condition met >> ||__ D supports DSN >> |___ N normalized value >> >> 0x06 0x008 4 7 --- Number of Hardware Resets >> 0x06 0x010 4 0 --- Number of ASR Events >> 0x06 0x018 4 0 --- Number of Interface CRC Errors >> |||_ C monitored condition met >> ||__ D supports DSN >> |___ N normalized value >> >> Number of Hardware Resets has incremented. There are no other errors shown: What is _exactly_ that value? Is it related to the number of resets sent from the HBA _or_ the device resetting by itself? >> I'd throw possible shade at the backplane or cable /but I have already >> swapped both out for spares without any change in behavior./ What about the power supply? Borja.home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1FFC1686-E70F-4649-B170-34F90B773918>
