From owner-freebsd-stable@FreeBSD.ORG Mon Jan 9 18:45:22 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 99CA7106564A for ; Mon, 9 Jan 2012 18:45:22 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta04.emeryville.ca.mail.comcast.net (qmta04.emeryville.ca.mail.comcast.net [76.96.30.40]) by mx1.freebsd.org (Postfix) with ESMTP id 7E96A8FC0A for ; Mon, 9 Jan 2012 18:45:22 +0000 (UTC) Received: from omta20.emeryville.ca.mail.comcast.net ([76.96.30.87]) by qmta04.emeryville.ca.mail.comcast.net with comcast id KWGY1i0051smiN4A4WlNi5; Mon, 09 Jan 2012 18:45:22 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta20.emeryville.ca.mail.comcast.net with comcast id KWlM1i00F1t3BNj8gWlML6; Mon, 09 Jan 2012 18:45:21 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 57A97102C1E; Mon, 9 Jan 2012 10:45:21 -0800 (PST) Date: Mon, 9 Jan 2012 10:45:21 -0800 From: Jeremy Chadwick To: Freddie Cash Message-ID: <20120109184521.GA95985@icarus.home.lan> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: FreeBSD Stable Subject: Re: Upgrade from 8.2-STABLE to 9.0-RELEASE wedges on SuperMicro H8DGiF-based system X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 18:45:22 -0000 On Mon, Jan 09, 2012 at 09:55:58AM -0800, Freddie Cash wrote: > On Mon, Jan 9, 2012 at 9:50 AM, John Nielsen wrote: > > From what you've said I strongly suspect that you have some kind of hardware issue. Dodgy RAM is my first guess, something cooling-related is my 2nd, and PSU is my 3rd. It is a little suspicious that you only started having problems after your upgrade but it could be coincidence or it could be something about the new software tickling the hardware differently than the old. > > That's what we're leaning toward as well. We're planning on doing a > BIOS upgrade (betadrive is running v2.00 and alphadrive is v1.00), > then a memtest86+ run, then check firmware on the SATA controllers. For hardware/system troubleshooting advice: 1) BIOS upgrade -- since this is also what's responsible for ACPI bits and other "configuration model" pieces of a system, 2) BIOS settings -- make sure they're all 100% identical between both systems, 3) Controller firmware -- please make sure these are the same (your controllers between boxes appear to be the same model), 4) Flaky PSU -- possibly voltages drop or raise below/above levels which the mainboard can handle. As someone who buys Supermicro exclusively for their systems, I can tell you that their PSUs ("Ablecom") are quite cheap/horrible. It's worth purchasing a replacement -- if it doesn't turn out to be the problem, you now have a spare PSU (which is good to have -- our last systems failure was due to a blown PSU). 5) Flaky RAM -- memtest86+ can help here, mostly but not entirely. 6) Flaky mainboard -- it happens. Really. :-) For OS advice: Compare rc.conf, loader.conf, and so on. For example, is one system using powerd(8) while the other isn't? > If none of the above helps, we're thinking of swapping the CPUs > between the two systems to see if the problems stay with the box or > follow the CPU. I was helping out someone on a public forum earlier this week who purchased a Dell desktop system that started behaving oddly. memtest86+ claimed all his DIMMs were bad (regardless of slot), and replacement DIMMs claimed the same thing. Dell kept insisting he reload the OS, else they can try a motherboard swap, blah blah blah. What amused me was that nobody looked at the CPU: Intel Core i3-550, which contains an on-die MCH. Chances are the MCH is going bad, which means time to replace the CPU. CPUs rarely go bad, but now with on-die MCHs, on-die VGA, etc. it's becoming much more plausible that the physical CPU needs to be replaced. They've become practically computers inside of a computer. :-) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |