From owner-freebsd-stable@FreeBSD.ORG Thu Jan 15 15:59:02 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 282921065736; Thu, 15 Jan 2009 15:59:02 +0000 (UTC) (envelope-from petefrench@ticketswitch.com) Received: from constantine.ticketswitch.com (constantine.ticketswitch.com [IPv6:2002:57e0:1d4e:1::3]) by mx1.freebsd.org (Postfix) with ESMTP id 5EC778FC0A; Thu, 15 Jan 2009 15:59:01 +0000 (UTC) (envelope-from petefrench@ticketswitch.com) Received: from dilbert.rattatosk ([10.64.50.6] helo=dilbert.ticketswitch.com) by constantine.ticketswitch.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.69 (FreeBSD)) (envelope-from ) id 1LNUcS-0003GO-91; Thu, 15 Jan 2009 15:58:56 +0000 Received: from petefrench by dilbert.ticketswitch.com with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1LNUcS-0001dY-5e; Thu, 15 Jan 2009 15:58:56 +0000 To: rwatson@FreeBSD.org In-Reply-To: Message-Id: From: Pete French Date: Thu, 15 Jan 2009 15:58:56 +0000 Cc: freebsd-stable@freebsd.org, drosih@rpi.edu, rblayzor.bulk@inoc.net Subject: Re: Big problems with 7.1 locking up :-( X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Jan 2009 15:59:03 -0000 > Given the inconsistency of the symptoms, I wouldn't preclude something > environmental: could it be that it was the bottom, or more likely, top box in > a rack and that your air conditioning isn't quite as effective there when the > outside temperature is above/below some threshold? It's a possibility - but the two machines which were exhibiting the fault are in Slough and Baton Rouge respectively, so under very diferent cliatic conditions. Howevere, something, has chhnaged to make it stop locking up! The USA one was doing it every couple of hours at the start of the week, and the UK on wouldnt last more than half an hour at one point. > Alternatively, could it be that the workload changed very slightly -- you're > doing less DNS queries, or the network latency to the DNS server changed? Also a possibility - that workload is entirely dependent on customer behaviour which is an unpredictable beast! > Certainly, whoever gave the advise on checking BIOS revisions is right: you > can spend a lot of time tracking down a bug to realize that one box has a > slightly different BIOS rev and therefore does/doesn't suffer from an obscure > SMI bug. Yes, thats next on my list - make sure they are all on the same version. > In any case, if it starts to reproduceably recur, send out mail and we can see > if we can track it down some more. BTW, did you establish if the version of > iLo you have has a remote NMI? I seem to recall that some do, and being able > to deliver an NMI is really quite valuable. OK, thanks. My iiLO2 appears to have the ability to generate an NMI oon demand, so that could be used if/whhen the fault crops up again. thanks, will let this lie for now and resurrect the thread when I can get some more useful data. -pete.