Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 19 Jun 2011 18:11:31 +0900
From:      Stephane LAPIE <stephane.lapie@darkbsd.org>
To:        Jeremy Chadwick <freebsd@jdc.parodius.com>
Cc:        freebsd-hardware@freebsd.org
Subject:   Re: Problem with a LSILogic SAS/SATA adapter on 8.2-STABLE/ZFSv28
Message-ID:  <4DFDBD43.4020606@darkbsd.org>
In-Reply-To: <20110618144536.GA15627@icarus.home.lan>
References:  <4DFCB12A.6030805@darkbsd.org> <20110618144536.GA15627@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig3F9D7F2C068C24B6B1419CB7
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On 06/18/2011 11:45 PM, Jeremy Chadwick wrote:
> For readers, the NMI and RAM parity error message in question is
> shown here:
>=20
> http://www.darkbsd.org/~darksoul/kernel-panic-mpt2.txt
>=20
> But is difficult to decode due to the well-established problem with the=

> FreeBSD kernel interspersing text output.  (I imagine this gets worse
> the more cores you have on your system, but that's not relevant to this=

> discussion)

Nothing a quick grep on the source tree couldn't fix, but yeah, annoying =
:)

> Anyway, to expand on the "RAM parity error" and NMI message: this
> information I'm going to give you isn't specific to the LSI controller;=

> it's a general piece of information.  I've talked about this in the
> past.  Please read it and focus on the SERR/PERR and NMI details:
>=20
> http://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010938.html

I see. Thanks for the extra bit of info.

> If you want to rule out actual system RAM issues, I would recommend
> running memtest86 for about 30 minutes, and then memtest86+ for the sam=
e
> amount of time.  This might sound crazy ("why can't I just run one?!"),=

> but you need to review the ChangeLog for memtest86 to see why.  Their
> support for detecting corrected ECC errors was removed with 4.0, but in=

> 4.0 they added multi-CPU support (which is good to have in this
> situation), while memtest86 may still have support for ECC.
>=20
> Neither of these utilities are as excellent as a hardware RAM tester
> (which does cool things like sending extreme amounts of voltage through=

> each DRAM module, looks for soft and hard errors, etc.), but those are
> expensive.  Usually system memory problems will show up in memtest86/86=
+
> pretty quickly though.

I am currently rebuilding a pool, it will have to wait until this is
done, and I will do it just to be on the safe side, but I think I
actually nailed it down to the controller.

> All that said: it may be possible that the NMIs you're seeing aren't
> being induced by system RAM issues at all, but somehow are being
> generated or caused by the LSI controller.  I wasn't under the
> impression that a PCIe MSI and/or MSI-X generated an NMI, but I could b=
e
> completely wrong.

Kernel panic problems would pop at random occurences (probably stress
induced, and the common point in each one of these was that one
processor was handling an interrupt for mpt0), sometimes every 10
minutes, sometimes every hour(s).

So, I put back another controller :

mvs0: <Marvell 88SX6081 SATA controller> port 0x3000-0x30ff mem
0xdf200000-0xdf2fffff irq 24 at device 1.0 on pci7
mvs0: Gen-II, 8 3Gbps ports, Port Multiplier supported
mvs0: [ITHREAD]

which did not exhibit this behavior.


By the way, for reference, the controller I had been using is a PCI-X
one, using a SAS-1068R chipset.

Here is a picture of the controller in case anyone is familiar with it :
http://www.darkbsd.org/~darksoul/fujitsu-siemens-lsi-sas1068.JPG

This is a Fujitsu OEM board with a LSI chip,
so I guess it *might* have some firmware quirks or something, making it
unfit for FreeBSD.

> P.S. -- In the future, try to avoid cross-posting.  :-)

Sorry about that. m(_ _)m
--=20
Stephane LAPIE, EPITA SRS, Promo 2005
"Even when they have digital readouts, I can't understand them."
--MegaTokyo


--------------enig3F9D7F2C068C24B6B1419CB7
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk39vUcACgkQ24Ql8u6TF2PpegCeO3hV7KzgfcysHqC3Gy+ylVnJ
OYEAnjfsaLRbdEFWEwOeEFNsNCXtp6wl
=5KfE
-----END PGP SIGNATURE-----

--------------enig3F9D7F2C068C24B6B1419CB7--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DFDBD43.4020606>