From owner-freebsd-hardware@FreeBSD.ORG Sun Jun 19 09:11:43 2011 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E2E20106564A for ; Sun, 19 Jun 2011 09:11:43 +0000 (UTC) (envelope-from stephane.lapie@darkbsd.org) Received: from quasar.darkbsd.org (shinigami.darkbsd.org [82.227.96.182]) by mx1.freebsd.org (Postfix) with ESMTP id 77D4C8FC12 for ; Sun, 19 Jun 2011 09:11:43 +0000 (UTC) Received: from quasar.darkbsd.org (localhost [127.0.0.1]) by quasar.darkbsd.org (Postfix) with ESMTP id CA57F641E; Sun, 19 Jun 2011 11:11:41 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=darkbsd.org; h=message-id :date:from:mime-version:to:cc:subject:references:in-reply-to :content-type; s=selector1; bh=hHu4vkILhoE2Mkvx09gFQcsgSss=; b=S tM2wRc7JQA33q2owSuYnwwYCCfbX2UEvdKsZKk7GMEDMZZ82xkMxz5KICwkyfutU FmIl4ex3hCTTpm5OvNPX5fYJL4QUiimoo7DA/gSDDuUSGbZO97FOA6/QwrErAaFi foODGtpmZaA9IDbuLijzyqqWR9I6ked/JOIZFRjXQ8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=darkbsd.org; h=message-id :date:from:mime-version:to:cc:subject:references:in-reply-to :content-type; q=dns; s=selector1; b=RNOE0+qaU7NMTMRLKh5elZzOnPz cLO1rZCAEl13kg2vWrNl0/wedxtXgKenegWFiJl2+Uiz0CdLJdaNUABXTFt5Uxuc iNqFTqV0ZQC5P4PO4LDp6Qm3iAoPXHZgQSkYr8phWYo3M36+eZ+pN7teri00c/qc bBbzDPnyl5Sk9XYU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=darkbsd.org; h= content-type:content-type:in-reply-to:references:subject:subject :mime-version:user-agent:from:from:date:date:message-id:received :received; s=selector1; t=1308474699; bh=UtTzhrkiSlxoEPb9cEJkXlh QUcK0uOQhVbgOwG3TZeg=; b=TOIvzj4foon6D9YDfWKReAxnzWgb+4Ey8yFwpsi iDxO4b4KJcRaxZDIoKh2WWFhCEv+oUBAX2C+ij3XUf6ClD8xi3zn1t46ncfaR5xT RMnQI4OHDztZO80uw1Hidr5yJGil34F5Atr8vbLU8eU160SJA7M6YxmOXZ777MWu cPfc= Received: from quasar.darkbsd.org ([127.0.0.1]) by quasar.darkbsd.org (quasar.darkbsd.org [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id bl0YOPmcbIwP; Sun, 19 Jun 2011 11:11:39 +0200 (CEST) Received: from [192.168.3.42] (archer.yomi.darkbsd.org [192.168.3.42]) (Authenticated sender: darksoul) by quasar.darkbsd.org (Postfix) with ESMTPSA id 4144C6416; Sun, 19 Jun 2011 11:11:37 +0200 (CEST) Message-ID: <4DFDBD43.4020606@darkbsd.org> Date: Sun, 19 Jun 2011 18:11:31 +0900 From: Stephane LAPIE User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110516 Thunderbird/3.1.10 MIME-Version: 1.0 To: Jeremy Chadwick References: <4DFCB12A.6030805@darkbsd.org> <20110618144536.GA15627@icarus.home.lan> In-Reply-To: <20110618144536.GA15627@icarus.home.lan> X-Enigmail-Version: 1.1.2 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig3F9D7F2C068C24B6B1419CB7" Cc: freebsd-hardware@freebsd.org Subject: Re: Problem with a LSILogic SAS/SATA adapter on 8.2-STABLE/ZFSv28 X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Jun 2011 09:11:44 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig3F9D7F2C068C24B6B1419CB7 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 06/18/2011 11:45 PM, Jeremy Chadwick wrote: > For readers, the NMI and RAM parity error message in question is > shown here: >=20 > http://www.darkbsd.org/~darksoul/kernel-panic-mpt2.txt >=20 > But is difficult to decode due to the well-established problem with the= > FreeBSD kernel interspersing text output. (I imagine this gets worse > the more cores you have on your system, but that's not relevant to this= > discussion) Nothing a quick grep on the source tree couldn't fix, but yeah, annoying = :) > Anyway, to expand on the "RAM parity error" and NMI message: this > information I'm going to give you isn't specific to the LSI controller;= > it's a general piece of information. I've talked about this in the > past. Please read it and focus on the SERR/PERR and NMI details: >=20 > http://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010938.html I see. Thanks for the extra bit of info. > If you want to rule out actual system RAM issues, I would recommend > running memtest86 for about 30 minutes, and then memtest86+ for the sam= e > amount of time. This might sound crazy ("why can't I just run one?!"),= > but you need to review the ChangeLog for memtest86 to see why. Their > support for detecting corrected ECC errors was removed with 4.0, but in= > 4.0 they added multi-CPU support (which is good to have in this > situation), while memtest86 may still have support for ECC. >=20 > Neither of these utilities are as excellent as a hardware RAM tester > (which does cool things like sending extreme amounts of voltage through= > each DRAM module, looks for soft and hard errors, etc.), but those are > expensive. Usually system memory problems will show up in memtest86/86= + > pretty quickly though. I am currently rebuilding a pool, it will have to wait until this is done, and I will do it just to be on the safe side, but I think I actually nailed it down to the controller. > All that said: it may be possible that the NMIs you're seeing aren't > being induced by system RAM issues at all, but somehow are being > generated or caused by the LSI controller. I wasn't under the > impression that a PCIe MSI and/or MSI-X generated an NMI, but I could b= e > completely wrong. Kernel panic problems would pop at random occurences (probably stress induced, and the common point in each one of these was that one processor was handling an interrupt for mpt0), sometimes every 10 minutes, sometimes every hour(s). So, I put back another controller : mvs0: port 0x3000-0x30ff mem 0xdf200000-0xdf2fffff irq 24 at device 1.0 on pci7 mvs0: Gen-II, 8 3Gbps ports, Port Multiplier supported mvs0: [ITHREAD] which did not exhibit this behavior. By the way, for reference, the controller I had been using is a PCI-X one, using a SAS-1068R chipset. Here is a picture of the controller in case anyone is familiar with it : http://www.darkbsd.org/~darksoul/fujitsu-siemens-lsi-sas1068.JPG This is a Fujitsu OEM board with a LSI chip, so I guess it *might* have some firmware quirks or something, making it unfit for FreeBSD. > P.S. -- In the future, try to avoid cross-posting. :-) Sorry about that. m(_ _)m --=20 Stephane LAPIE, EPITA SRS, Promo 2005 "Even when they have digital readouts, I can't understand them." --MegaTokyo --------------enig3F9D7F2C068C24B6B1419CB7 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk39vUcACgkQ24Ql8u6TF2PpegCeO3hV7KzgfcysHqC3Gy+ylVnJ OYEAnjfsaLRbdEFWEwOeEFNsNCXtp6wl =5KfE -----END PGP SIGNATURE----- --------------enig3F9D7F2C068C24B6B1419CB7--