From owner-freebsd-stable@FreeBSD.ORG Sun Oct 2 07:37:45 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 60C241065673 for ; Sun, 2 Oct 2011 07:37:45 +0000 (UTC) (envelope-from thomas.e.zander@googlemail.com) Received: from mail-ey0-f182.google.com (mail-ey0-f182.google.com [209.85.215.182]) by mx1.freebsd.org (Postfix) with ESMTP id E753F8FC0A for ; Sun, 2 Oct 2011 07:37:44 +0000 (UTC) Received: by eyg7 with SMTP id 7so2739453eyg.13 for ; Sun, 02 Oct 2011 00:37:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=RWeNaXpr72sBMQZ9ONhPxjc4JIb9II8zXh5UEDRyyGw=; b=dNl2aBiAWP/0RtP7s7jkLrcvJ5vXECjitFgf6Q3h8k6tuMS42MUdXIxtVb2xd/rAAz vyUa4N5NCGhmZDUF5Btz+/up+0U5B+7SuC9kJ32c4WJcXS6IzChZuxwnEFt8QFYj5nui wX03D0970qnXGy/sLb3X0YvjYu/K1TFhALPko= MIME-Version: 1.0 Received: by 10.14.35.102 with SMTP id t78mr4690700eea.170.1317541063563; Sun, 02 Oct 2011 00:37:43 -0700 (PDT) Received: by 10.14.127.76 with HTTP; Sun, 2 Oct 2011 00:37:43 -0700 (PDT) In-Reply-To: <20111001102327.GA37434@icarus.home.lan> References: <20111001102327.GA37434@icarus.home.lan> Date: Sun, 2 Oct 2011 09:37:43 +0200 Message-ID: From: Thomas Zander To: Jeremy Chadwick Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-stable Subject: Re: Interpreting MCA error output X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Oct 2011 07:37:45 -0000 Hello Jeremy, first, thank you for the extensive explanation. It cleared some things up for me. I do have some rambling to add, though :-) On Sat, Oct 1, 2011 at 12:23, Jeremy Chadwick wr= ote: > So what should you do? =A0Replace the RAM. =A0Which DIMM? =A0Sadly I don'= t > know how to determine that. =A0Some system BIOSes (particularly on AMD > systems I've used) let you do memory tests (similar to memtest86) within > the BIOS which can then tell you which DIMM slot experienced a problem. > If yours doesn't have that, I would have to say purchase all new RAM > (yes, all of it) and test the individual DIMMs later so you can > determine which is bad. Well, I wasn't too surprised by the panic. I have read somewhere that in these situations the kernel might simply panic since the system might be in a compromised state. So far so ... well ... acceptable. My question here is how can I be certain right now if any of the DIMMs has gone bad. You mentioned problems you have all the time with DIMMs due to bad cooling in data centers. My machine in question is not located in a data center, that was my home server that tends to have very little load. But being located in my apartment, there are lots of _potential_ problems, including stability of power. In fact this was the first MCA event with these DIMMs ever, in more than a year. But of course you could be right. A DIMM could be rotten. Absolutely. Regarding your suggestion to do memory tests: My BIOS does not support testing, so I booted up memtest86+ after reading your e-mail and let it run for almost a whole day now. It did not encounter a single problem. So, even if I bought new DIMMs at once, it might take weeks to figure out which DIMM is rotten, if at all. Assuming that MCA events stay this infrequent, that is. Of course I'll observe the machine closely, but if the rate stays at one MCA event per year, it'll take some time to figure out the broken DIMM :-) > I should really work with John to make mcelog a FreeBSD port and just > regularly update it with patches, etc. to work on FreeBSD. =A0DMI support > and so on I don't think can be added (at least not by me), but simple > ASCII decoding? =A0Very possible. That would be absolutely helpful! After all, FreeBSD is primarily a server OS, and where would one have ECC if not on servers. Being able to determine what's wrong with memory would be certainly very valuable for many admins. > An alternative would be for me to make a CGI version where you could > just go my site and paste in the FreeBSD MCE and it would siphon it > through mcelog and give you the output. I could live with that, too :-) Thanks again for your extensive explanation, I appreciate it very much! Now I am going to watch that machine closely... Best regards, Riggs