Date: Thu, 6 Mar 2008 23:43:31 +0700 (KRAT) From: Eugene Grosbein <eugen@kuzbass.ru> To: FreeBSD-gnats-submit@FreeBSD.org Subject: kern/121433: [cpufreq] kern_cpu.c's logic error leads to spontaneous disabling of passive cooling Message-ID: <200803061643.m26GhVBU005478@delikates-nk.ru> Resent-Message-ID: <200803061700.m26H042m004891@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 121433 >Category: kern >Synopsis: [cpufreq] kern_cpu.c's logic error leads to spontaneous disabling of passive cooling >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Thu Mar 06 17:00:03 UTC 2008 >Closed-Date: >Last-Modified: >Originator: Eugene Grosbein >Release: FreeBSD 6.3-PRERELEASE i386 >Organization: Svyaz-Service JSC >Environment: System: FreeBSD 6.3-PRERELEASE, Pentium-4 2.0Ghz >Description: I've 1U/unipocessor FreeBSD 6.3-PRERELEASE server having inadequate active cooling that leads to CPU overheating. The server is remote and while good cooling is being prepared, I decided to use passive cooling feature of acpi_thermal(4). It uses p4tcc here and really helps to keep CPU temperature in bounds but there is annoying bug: very often (many times per hour) the acpi_thermal(4) disables passive cooling with a message: failed to set new freq, disabling passive cooling So I need to use cron to (re)enable passive cooling ones a minute to keep it running. I've tracked this down to src/sys/kern/kern_cpu.c, function cf_get_method(): 1) src/sys/dev/acpica/acpi_thermal.c, function acpi_tz_cooling_thread() calls acpi_tz_cpufreq_update() from same file; 2) acpi_tz_cpufreq_update() calls CPUFREQ_GET() that takes us to src/sys/kern/kern_cpu.c, cf_get_method(); 3) cf_get_method() has the following code: /* * Reacquire the lock and search for the given level. * * XXX Note: this is not quite right since we really need to go * through each level and compare both absolute and relative * settings for each driver in the system before making a match. * The estimation code below catches this case though. */ CF_MTX_LOCK(&sc->lock); for (n = 0; n < numdevs && curr_set->freq == CPUFREQ_VAL_UNKNOWN; n++) { if (!device_is_attached(devs[n])) continue; error = CPUFREQ_DRV_GET(devs[n], &set); if (error) continue; for (i = 0; i < count; i++) { if (CPUFREQ_CMP(set.freq, levels[i].total_set.freq)) { sc->curr_level = levels[i]; break; } } } Note that error value is not cleaned after this cycle. It happens to be ENXIO after the cycle in my case. Later code successfully reports: CF_DEBUG("get estimated freq %d\n", curr_set->freq); (curr_set->freq always happens to be max value of CPU frequency here) Then it does 'return (error);' with value ENXIO propagated from the cycle shown above. 4) acpi_tz_cpufreq_update() propagates ENXIO to acpi_tz_cooling_thread() that disables passive cooling. >How-To-Repeat: Just use uniprocessor Pentium-4 system with heavy constant CPU load, acpi_thermal/cpufreq/p4tcc and tune acpi_thermal so passive cooling gets used. Here is my /etc/sysctl.conf: debug.cpufreq.lowest=1246 #debug.cpufreq.verbose=1 hw.acpi.thermal.user_override=1 hw.acpi.thermal.tz0.passive_cooling=1 hw.acpi.thermal.tz0._PSV=70C hw.acpi.thermal.tz0._CRT=75C >Fix: Unknown. Perhaps, just clear errno after the code cited above? As workaround, I've patched acpi_thermal(4) to not disable passive cooling when acpi_tz_cpufreq_update() returns ENXIO, that works for me. Eugene Grosbein >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200803061643.m26GhVBU005478>