From owner-freebsd-stable@freebsd.org Wed Feb 6 15:18:51 2019 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8F93114D677A for ; Wed, 6 Feb 2019 15:18:51 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from cu01176b.smtpx.saremail.com (cu01176b.smtpx.saremail.com [195.16.151.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 69F0B712E6 for ; Wed, 6 Feb 2019 15:18:50 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from [172.16.8.5] (unknown [192.148.167.11]) by proxypop01.sare.net (Postfix) with ESMTPA id B3BE59DF622; Wed, 6 Feb 2019 16:18:38 +0100 (CET) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: 9211 (LSI/SAS) issues on 11.2-STABLE From: Borja Marcos In-Reply-To: <57ddc2f4-681c-e0aa-0484-42cee3876a05@denninger.net> Date: Wed, 6 Feb 2019 16:18:37 +0100 Cc: freebsd-stable@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <1FFC1686-E70F-4649-B170-34F90B773918@sarenet.es> References: <7bb25f55-fa77-f67e-11f3-b2240b01e25a@denninger.net> <9ea70420-0c06-ad9d-e8b7-f9d92fed20d8@denninger.net> <57ddc2f4-681c-e0aa-0484-42cee3876a05@denninger.net> To: Karl Denninger X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: 69F0B712E6 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dmarc=pass (policy=none) header.from=sarenet.es; spf=pass (mx1.freebsd.org: domain of borjam@sarenet.es designates 195.16.151.151 as permitted sender) smtp.mailfrom=borjam@sarenet.es X-Spamd-Result: default: False [-3.63 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:195.16.150.0/23]; MV_CASE(0.50)[]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[smtp.sarenet.es,smtp.sarenet.es,smtp.sarenet.es]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[151.151.16.195.list.dnswl.org : 127.0.10.0]; NEURAL_HAM_SHORT(-0.76)[-0.764,0]; DMARC_POLICY_ALLOW(-0.50)[sarenet.es,none]; IP_SCORE(-0.56)[ip: (-1.85), ipnet: 195.16.128.0/19(-0.55), asn: 3262(-0.44), country: ES(0.05)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:3262, ipnet:195.16.128.0/19, country:ES]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Feb 2019 15:18:51 -0000 > On 5 Feb 2019, at 23:49, Karl Denninger wrote: >=20 > BTW under 12.0-STABLE (built this afternoon after the advisories came > out, with the patches) it's MUCH worse. I get the same device resets > BUT it's followed by an immediate panic which I cannot dump as it > generates a page-fault (supervisor read data, page not present) in the > mps *driver* at mpssas_send_abort+0x21. > This precludes a dump of course since attempting to do so gives you a > double-panic (I was wondering why I didn't get a crash dump!); I'll > re-jigger the box to stick a dump device on an internal SATA device so = I > can successfully get the dump when it happens and see if I can obtain = a > proper crash dump on this. >=20 > I think it's fair to assume that 12.0-STABLE should not panic on a = disk > problem (unless of course the problem is trying to page something back > in -- it's not, the drive that aborts and resets is on a data pack = doing > a scrub) It shouldn=E2=80=99t panic I imagine. >>>> mps0: Sending reset from mpssas_send_abort for target ID 37 >> 0x06 =3D=3D=3D=3D=3D =3D =3D =3D=3D=3D =3D=3D = Transport Statistics (rev 1) =3D=3D >> 0x06 0x008 4 6 --- Number of Hardware Resets >> 0x06 0x010 4 0 --- Number of ASR Events >> 0x06 0x018 4 0 --- Number of Interface CRC Errors >> |||_ C monitored condition met >> ||__ D supports DSN >> |___ N normalized value >>=20 >> 0x06 0x008 4 7 --- Number of Hardware Resets >> 0x06 0x010 4 0 --- Number of ASR Events >> 0x06 0x018 4 0 --- Number of Interface CRC Errors >> |||_ C monitored condition met >> ||__ D supports DSN >> |___ N normalized value >>=20 >> Number of Hardware Resets has incremented. There are no other errors = shown: What is _exactly_ that value? Is it related to the number of resets sent = from the HBA _or_ the device resetting by itself? >> I'd throw possible shade at the backplane or cable /but I have = already >> swapped both out for spares without any change in behavior./ What about the power supply?=20 Borja.