From owner-freebsd-stable@FreeBSD.ORG Thu Jan 15 15:48:18 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A7A0D1065677 for ; Thu, 15 Jan 2009 15:48:18 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 833AB8FC14 for ; Thu, 15 Jan 2009 15:48:18 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 3C47946B58; Thu, 15 Jan 2009 10:48:18 -0500 (EST) Date: Thu, 15 Jan 2009 15:48:18 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Pete French In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-stable@freebsd.org, drosih@rpi.edu, rblayzor.bulk@inoc.net Subject: Re: Big problems with 7.1 locking up :-( X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Jan 2009 15:48:19 -0000 On Thu, 15 Jan 2009, Pete French wrote: > Just an update on this - I tried the various kernels, but now the machine is > not locking up at all. As I havent actually chnaged anything then this does > not make me as happy as you might expect. I don;t know what to do now - I > daare not upgrade the machines to an OS that I know locks, but if I cant > make it lock then it is impossible to get any useful debugging info out of. > maybe waiting for 7.2 is the best move... Well, one slightly pessimistic (or realistic) view says that all software contains bugs, it's just a question of whether or not your workload and environment trigger those bugs in a noticeable way. Given the inconsistency of the symptoms, I wouldn't preclude something environmental: could it be that it was the bottom, or more likely, top box in a rack and that your air conditioning isn't quite as effective there when the outside temperature is above/below some threshold? Alternatively, could it be that the workload changed very slightly -- you're doing less DNS queries, or the network latency to the DNS server changed? Certainly, whoever gave the advise on checking BIOS revisions is right: you can spend a lot of time tracking down a bug to realize that one box has a slightly different BIOS rev and therefore does/doesn't suffer from an obscure SMI bug. In any case, if it starts to reproduceably recur, send out mail and we can see if we can track it down some more. BTW, did you establish if the version of iLo you have has a remote NMI? I seem to recall that some do, and being able to deliver an NMI is really quite valuable. Robert N M Watson Computer Laboratory University of Cambridge