From owner-freebsd-acpi@FreeBSD.ORG Sun Sep 17 14:26:46 2006 Return-Path: X-Original-To: acpi@freebsd.org Delivered-To: freebsd-acpi@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0175416A47C for ; Sun, 17 Sep 2006 14:26:46 +0000 (UTC) (envelope-from Alex.Kovalenko@verizon.net) Received: from vms042pub.verizon.net (vms042pub.verizon.net [206.46.252.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id C389043D5C for ; Sun, 17 Sep 2006 14:26:44 +0000 (GMT) (envelope-from Alex.Kovalenko@verizon.net) Received: from RabbitsDen ([70.21.201.244]) by vms042.mailsrvcs.net (Sun Java System Messaging Server 6.2-4.02 (built Sep 9 2005)) with ESMTPA id <0J5Q001LUQSE9IM0@vms042.mailsrvcs.net> for acpi@freebsd.org; Sun, 17 Sep 2006 09:26:39 -0500 (CDT) Date: Sun, 17 Sep 2006 10:26:16 -0400 From: "Alexandre \"Sunny\" Kovalenko" In-reply-to: <20060916234642.GC698@bunrab.catwhisker.org> To: David Wolfskill Message-id: <1158503176.754.26.camel@RabbitsDen> MIME-version: 1.0 X-Mailer: Evolution 2.6.3 FreeBSD GNOME Team Port Content-type: multipart/mixed; boundary="Boundary_(ID_O58A4aArqvfZgtRbBK518g)" References: <20060916234642.GC698@bunrab.catwhisker.org> Cc: acpi@freebsd.org Subject: Re: Avoiding "WARNING: system temperature too high, shutting down soon!"? X-BeenThere: freebsd-acpi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: ACPI and power management development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Sep 2006 14:26:46 -0000 --Boundary_(ID_O58A4aArqvfZgtRbBK518g) Content-type: text/plain; charset=utf-8 Content-transfer-encoding: 8BIT On Sat, 2006-09-16 at 16:46 -0700, David Wolfskill wrote: > I could use some help: I seem to overheat my laptop; I'd like to get > some idea of how to avoid the overheating, preferably while still > getting the work done. > > The laptop is a Dell Inspiron 8200. I recently bought this one to > replace a 1.6 GHz one that had developed an occasional problem with > the LCD display that made the display unusable (though I could SSH in to > the machine usually). This machine is a 2.4 GHz P4M with 768 MB RAM (at > the moment). > > During Nate's BAFUG talk earlier this month, I decided to try running > powerd; I set the mode at "adaptive" for AC, battery, and unknown, and > dev.cpu.0.freq reports that it normally sits at 150, but appears to ramp > up quite responsively during, say, a "make buildworld." (The eralier > laptop sits at dev.cpu.0.freq=1600 during that process; the current one > sits at 2400 -- as expected). > > However, the temperature (as reported by hw.acpi.thermal.tz0.temperature), > which meanders between 52 - 62C while the machine isn't doing much, > tends to spend long stretches of time in the 80 - 90C range during a > "make buildworld" (as reported by a "while (1)" loop during said > process). As you can see from the salient sysctl values, that's not a > lot of headroom: > > g1-18(6.2-P)[4] sysctl hw.acpi.thermal dev.cpu.0 > hw.acpi.thermal.min_runtime: 0 > hw.acpi.thermal.polling_rate: 10 > hw.acpi.thermal.user_override: 0 > hw.acpi.thermal.tz0.temperature: 58.5C > hw.acpi.thermal.tz0.active: -1 > hw.acpi.thermal.tz0.passive_cooling: 0 > hw.acpi.thermal.tz0.thermal_flags: 0 > hw.acpi.thermal.tz0._PSV: -1 > hw.acpi.thermal.tz0._HOT: -1 > hw.acpi.thermal.tz0._CRT: 94.0C > hw.acpi.thermal.tz0._ACx: -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 > dev.cpu.0.%desc: ACPI CPU > dev.cpu.0.%driver: cpu > dev.cpu.0.%location: handle=\_PR_.CPU0 > dev.cpu.0.%pnpinfo: _HID=none _UID=0 > dev.cpu.0.%parent: acpi0 > dev.cpu.0.freq: 150 > dev.cpu.0.freq_levels: 2400/0 2100/0 1800/0 1500/0 1200/0 1050/0 900/0 750/0 600/0 450/0 300/0 150/0 > g1-18(6.2-P)[5] > > leading to: > > Sep 16 10:11:43 localhost root: WARNING: system temperature too high, shutting down soon! > Sep 16 10:11:43 localhost syslogd: /dev/:0: No such file or directory > Sep 16 10:11:49 localhost kernel: acpi_tz0: WARNING - current temperature (94.5C) exceeds safe limits > Sep 16 10:11:55 localhost syslogd: exiting on signal 15 > > this morning while I was running yesterday's -CURRENT, building today's. > (I had already built today's -STABLE, aka -6.2-PRERELEASE successfully.) > > And that's the work that I'd like to be able to do: track RELENG_6 > & HEAD on a daily basis. With a few interruptions, mostly from > events not of my choosing, I've been doing this with various machines, > including laptops, for some years. > > I suppose it's possible that the cooling just isn't adequate for the > machine, though each of the 2 fans appears to operate. (Each has a > "high", "low", and "off" setting; one fan is for the CPU; the other is > for the motherboard -- per Dell's diagnostics. The motherboard fan > does make an odd sound sometimes, though the diagnostics claim that it > was running fine.) > > Just prior to the forced shutdown (above), the reported temperature > had been >90C for several minutes, and the fans were going full > bore. I had elevated the laptop above a smooth flat surface, then > put a bag of ice under it -- apparently to no avail. > > So: in the face of prolonged near-critical temperatures, is there a way > to tell the machine to throttle back & work a bit less hard? OF course, > if there's a way to make the cooling more effective, I'd certainly be > interested in that, as well -- but having the machine shut down like > that is awfully disruptive. :-/ > > Please include me in responses, as ACPI isn't one of the things I follow > closely enough to subscribe to the list. > > I will, of course, summariize responses sent off-list that appear to be > useful. > > Thanks! > > Peace, > david I have attached patch, I have put together in the days of 6-CURRENT (I think), which adds -t switch to powerd. Patch would coerce powerd to drop CPU frequency when temperature is reached. Unfortunately, I no longer have 6.x system to try it on, so if patch would not apply, you can either manually add necessary code or send me your version of /usr/src/usr.sbin/powerd/powerd.c and I will modify it appropriately. Since, it does not look like you have ACx levels configured in your ASL, it is possible that your BIOS have "Fan learning" option. This is the mode when CPU is run at different frequencies and under different load and fan speed is adjusted to keep temperature at certain level. Obstructing air flow (by partially bloking air holes) during learning mode will usually result in cooler, but noisier system. When I had similar problem with my laptop, applying moderate amount of the thermal grease and resetting CPU fan fixed it for good. I would also recommend investigation source of the noise, you have mentioned -- mechanical obstacles in the path of the fan might cause fan itself to heat up in the most inopportune moment (read at the highest speeds). Additionally, if you are pretty much sure that your hardware could withstand higher temperatures, you can always override _CRT value in your ASL. See appropriate handbook section to dump your ASL and then search for something like Method (_CRT, 0, NotSerialized) { Return (KELV (0x5d)) } return value is in the tenth of the degree on the Kelvin's scale. I, personally, would not do that. And last, but not the least -- Antec coolpad (active, USB powered) is buildword's best friend -- even if your laptop handles temperature properly, replacing the coolpad is much cheaper and easier then replacing the fan which has died because it was running full bore for far too long. HTH, -- Alexandre Kovalenko (Олександр Коваленко) --Boundary_(ID_O58A4aArqvfZgtRbBK518g) Content-type: text/x-patch; name=powerd.c.patch; charset=utf-8 Content-transfer-encoding: 7BIT Content-disposition: attachment; filename=powerd.c.patch --- ./usr.sbin/powerd/powerd.c Sun Apr 17 11:25:41 2005 +++ /home/sunny/powerd.c Sun Apr 24 20:46:40 2005 @@ -46,7 +46,8 @@ #define DEFAULT_ACTIVE_PERCENT 65 #define DEFAULT_IDLE_PERCENT 90 -#define DEFAULT_POLL_INTERVAL 500 /* Poll interval in milliseconds */ +#define DEFAULT_POLL_INTERVAL 500 /* Poll interval in milliseconds */ +#define VERY_HIGH_TEMPERATURE 200 enum modes_t { MODE_MIN, @@ -83,11 +84,13 @@ static int freq_mib[4]; static int levels_mib[4]; static int acline_mib[3]; +static int temp_mib[5]; /* Configuration */ static int cpu_running_mark; static int cpu_idle_mark; static int poll_ival; +static int passive_cooling_mark; static int apm_fd; static int exit_requested; @@ -244,7 +247,7 @@ { fprintf(stderr, -"usage: powerd [-v] [-a mode] [-b mode] [-i %%] [-n mode] [-p ival] [-r %%]\n"); +"usage: powerd [-v] [-a mode] [-b mode] [-i %%] [-n mode] [-p ival] [-r %%] [-t temperature]\n"); exit(1); } @@ -252,7 +255,7 @@ main(int argc, char * argv[]) { long idle, total; - int curfreq, *freqs, i, *mwatts, numfreqs; + int curfreq, *freqs, i, *mwatts, numfreqs, temperature; int ch, mode_ac, mode_battery, mode_none, acline, mode, vflag; uint64_t mjoules_used; size_t len; @@ -263,10 +266,11 @@ cpu_idle_mark = DEFAULT_IDLE_PERCENT; poll_ival = DEFAULT_POLL_INTERVAL; mjoules_used = 0; + passive_cooling_mark = VERY_HIGH_TEMPERATURE; vflag = 0; apm_fd = -1; - while ((ch = getopt(argc, argv, "a:b:i:n:p:r:v")) != EOF) + while ((ch = getopt(argc, argv, "a:b:i:n:p:r:t:v")) != EOF) switch (ch) { case 'a': parse_mode(optarg, &mode_ac, ch); @@ -300,6 +304,16 @@ usage(); } break; + case 't': + passive_cooling_mark = atoi(optarg); + if(passive_cooling_mark < 0 || passive_cooling_mark > 100) { + warnx("%d is not valid temperature for passive cooling", + passive_cooling_mark); + usage(); + } + passive_cooling_mark *= 10; + passive_cooling_mark += 2733; + break; case 'v': vflag = 1; break; @@ -320,6 +334,9 @@ len = 4; if (sysctlnametomib("dev.cpu.0.freq_levels", levels_mib, &len)) err(1, "lookup freq_levels"); + len = 5; + if (sysctlnametomib("hw.acpi.thermal.tz0.temperature", temp_mib, &len)) + err(1, "lookup temperature"); /* Check if we can read the idle time and supported freqs. */ if (read_usage_times(NULL, NULL)) @@ -370,6 +387,10 @@ len = sizeof(curfreq); if (sysctl(freq_mib, 4, &curfreq, &len, NULL, 0)) err(1, "error reading current CPU frequency"); + /* Read current temperature. */ + len = sizeof(temperature); + if(sysctl(temp_mib, 5, &temperature, &len, NULL, 0)) + err(1, "error reading current temperature"); if (vflag) { for (i = 0; i < numfreqs; i++) { @@ -410,12 +431,31 @@ err(1, "error setting CPU freq %d", freqs[0]); } + /* Check for passive cooling override */ + if(temperature > passive_cooling_mark) { + if (vflag) { + printf("passive cooling override; " + "changing frequency to %d MHz\n", + freqs[numfreqs - 1]); + } + if (set_freq(freqs[numfreqs - 1])) + err(1, "error setting CPU freq %d", + freqs[numfreqs - 1]); + } continue; } /* Adaptive mode; get the current CPU usage times. */ if (read_usage_times(&idle, &total)) err(1, "read_usage_times"); + /* + * If temperature has risen over passive cooling mark, we + * would want to decrease frequency regardless of the load, + * Simplest way to go about this would be to report 100% + * idle CPU and let adaptive algorithm do its job. + */ + if(temperature > passive_cooling_mark) + idle = total; /* * If we're idle less than the active mark, jump the CPU to --Boundary_(ID_O58A4aArqvfZgtRbBK518g)--