From owner-freebsd-acpi@FreeBSD.ORG Tue Jan 15 21:11:02 2008 Return-Path: Delivered-To: acpi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5842916A418 for ; Tue, 15 Jan 2008 21:11:02 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from mail.netplex.net (mail.netplex.net [204.213.176.10]) by mx1.freebsd.org (Postfix) with ESMTP id 35A1F13C46E for ; Tue, 15 Jan 2008 21:11:02 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) by mail.netplex.net (8.14.2/8.14.2/NETPLEX) with ESMTP id m0FLAviO027000; Tue, 15 Jan 2008 16:10:57 -0500 (EST) X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.netplex.net) X-Greylist: Message whitelisted by DRAC access database, not delayed by milter-greylist-4.0 (mail.netplex.net [204.213.176.10]); Tue, 15 Jan 2008 16:10:57 -0500 (EST) Date: Tue, 15 Jan 2008 16:10:57 -0500 (EST) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: Kevin Oberman In-Reply-To: <20080115210206.849E24500E@ptavv.es.net> Message-ID: References: <20080115210206.849E24500E@ptavv.es.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: acpi@freebsd.org Subject: Re: How to disable acpi thermal? X-BeenThere: freebsd-acpi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: ACPI and power management development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2008 21:11:02 -0000 On Tue, 15 Jan 2008, Kevin Oberman wrote: >> Date: Tue, 15 Jan 2008 15:34:41 -0500 (EST) >> From: Daniel Eischen >> Sender: owner-freebsd-acpi@freebsd.org >> >> [ Redirected from -current ] >> >> On Mon, 14 Jan 2008, Alexandre "Sunny" Kovalenko wrote: >> >>> >>> On Mon, 2008-01-14 at 21:56 -0500, Daniel Eischen wrote: >>>> >>>> Thermal zone 0 skyrockets past 110C in a couple of minutes >>>> when trying to build a kernel. All the other zones stay >>>> relatively static. I suspect something is wrong somewhere >>>> because this machine is very lightly loaded and has never >>>> had a problem until now. I just upgraded it from 4.x to >>>> 7.0. >>> >>> It need not to be bogus -- if I turn off fan on my ThinkPad it will >>> overheat and shut itself down within couple of minutes of buildworld, >>> starting from the relative cool state. From the look of the stuff below >>> your fan should kick in no later then 10 seconds after tz0 reached 77C. >>> Do you hear it running before shutdown? If yes, maybe lowering threshold >>> in AC0 down from 77C will help. If not -- you will need to figure out >>> who is supposed to turn on the fan. You can dump your ASL (instructions >>> in the handbook) and post it someplace accessible -- I will take a look >>> and maybe spot something interesting, but, being far from the expert in >>> the field, I do not promise too much. >> >> I posted the acpidump here: >> >> http://people.freebsd.org/~deischen/stl2.iasl >> >> The problem is that acpi_thermal keeps shutting down the system >> after 2 minutes into a buildkernel. The system has no load other >> than the buildkernel at the time it shuts down. >> >> The system is a Intel STL2 Tupelo motherboard with 1 CPU, the >> other CPU socket being occupied by a CPU terminator thingy. >> I uncovered the rackmount system and watched it while building >> a kernel. With the cover off the acpi monitored temperature >> went to 107C and stayed there. It only took a minute or two >> to get there. I felt around inside the chassis and nothing >> was even near being to warm or hot. With the cover on, the >> temperature goes to 111/112C before being shutdown by acpi_thermal >> (the limit being 110C). There is no way anything in that >> chassis is anywhere near 100C. I've disabled acpi_thermal >> for now, but it'd be nice to get a better fix. >> >> Any ideas? > > Bad CPU or bad support chip? The temperature on modern CPUs is measured > on the silicon. There is usually a junction that is simply brought out > to a pair of pins and an external device "reads" the temperature. > > It's possible that the chip has a bad junction or support chip that is > providing bogus information. On most processors it looks like the > thermal "crowbar" that will kill power if the temperature reaches about > 135C or something near to that. (I have not looked at a spec sheet for > any CPUs in about three years, so things might have changed. That is > outside the control of acpi_thermal, so turning it off may remove alarms > and prevent a shutdown at _CRT, but that won't prevent a shutdown at the > higher "meltdown" temperature. That one is intended for loose/missing > hear sinks or other major thermal failures. We'll see, I'm doing a buildworld with acpi_thermal disabled, but with it disabled I can no longer see what the monitored temperature is. > It is also possible that there is a BIOS bug that is reporting the > temperature incorrectly. That seems less likely as it would probably be > noticed by a lot of folks. > > Is there any chance that the heat sink is loose or improperly attached? > (It happened to me a few years ago.) Nope, I looked at it, felt it, etc. The CPU isn't hot at all. 110 is well past the boiling point, so I should be able to feel at least some heat from and around the CPU if it was really running hot. -- DE