From owner-freebsd-acpi@FreeBSD.ORG Tue Feb 15 08:13:14 2005 Return-Path: Delivered-To: freebsd-acpi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 23CB816A4E9 for ; Tue, 15 Feb 2005 08:13:14 +0000 (GMT) Received: from ylpvm01.prodigy.net (ylpvm01-ext.prodigy.net [207.115.57.32]) by mx1.FreeBSD.org (Postfix) with ESMTP id 95DCA43D48 for ; Tue, 15 Feb 2005 08:13:11 +0000 (GMT) (envelope-from nate@root.org) Received: from [10.0.5.51] (adsl-64-171-186-189.dsl.snfc21.pacbell.net [64.171.186.189])j1F8D6vE005877; Tue, 15 Feb 2005 03:13:06 -0500 Message-ID: <4211AF11.7050109@root.org> Date: Tue, 15 Feb 2005 00:13:05 -0800 From: Nate Lawson User-Agent: Mozilla Thunderbird 1.0RC1 (X11/20041205) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Kevin Oberman References: <20050214214422.C9F925D07@ptavv.es.net> In-Reply-To: <20050214214422.C9F925D07@ptavv.es.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: acpi@freebsd.org Subject: Re: HEADSUP: cpufreq import complete, acpi_throttling changed X-BeenThere: freebsd-acpi@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: ACPI and power management development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Feb 2005 08:13:14 -0000 Kevin Oberman wrote: >>Date: Mon, 14 Feb 2005 22:19:48 +0100 >>From: Pawel Worach >> >>Hi, sorry for the delay. I bumped the number of retries to 2000 and I >>can still repro. the error if the cpu has some load, I believe that is >>expected. Even when "idle" (gnome desktop running) it works fine with >>100, I think the first time I tested it I had mplayer running. I can't >>see a real-life reason for bumping the number of retries, from all >>speeds above 200Mhz I can step back up to 1.7Ghz without problems >>under light cpu load. The power_profile script should probably have a >>min limit, 75Mhz is ridiculous :) Ok, no problem. >>Another cool thing would be if the speed could be stepped >>automagically based on current battery level, that would likely be the >>job for a powerd(8). > > > I've been things about this, too, Strangely, I've thought about this for 2 years. ;-) > and I think that stepping things down > with battery level is not the answer. I think it MIGHT make sense to do > so based on battery discharge rate. This would allow a user to configure > an approximate battery "lifetime". It is especially important as > batteries wear and, if two batteries are present, one discharges faster > than the other. I don't think battery level or discharge rate is a useful control input. Think about if you have your laptop sitting on your desk for an hour, and then want to buildworld for an hour. You certainly want the system to power everything possible down for the first hour and run as fast as possible the second hour. The factor almost everyone gets wrong is the integral: Nate Efficiency = Useful Work Done / Amp-hours burned Total amp-hours = Sum[t: 0...Tdead](PowerUsed(t)) This formula explains why you should run the CPU at full speed whenever there is work to be done (increasing the numerator) and run it as low as possible when idle (decreasing the loss of the denominator). The denominator is ultimately fixed (you only have so much battery). On AC power, the denominator is infinite (meaning Nate is never very efficient). In this case, thermal (and hence noise) issues become an issue. You also want to keep your power bill low so conserving AC power is a minor but valid concern (think: server farm.) I think the control inputs should be current system load and thermal level. The system load control function would take the current instantaneous load, all previous measured loads, and current CPU freq level and output its desired frequency. The thermal function would take into account temperatures of each zone, current active coolers, desired temperature, current CPU freq, and output its desired frequency. There would be a weighting value that would allow the user to select which factor dominates the decision. All this is a good research project. Don't forget to throw in sleep states (i.e. S1 or S3) and disk spindown if you want to be complete. The Linux "laptop mode" patches are a good example how to do some of this right, as well as iBook behavior. > The other issue is thermal. I would assume that the frequency should be > decreased when _PSV is reached, but should it continue to drop the > frequency until the temperature stabilizes or until it drops to _PSV. I > believe the latter is a better choice, especially as the effect is not > quite instantaneous, and since it is only read by ACPI at fairly long > intervals. This means that the adjustment should not be too aggressive > to prevent continual oscillation of the frequency and temperature. > > And when do you start increasing the frequency again when temperature > drops? Once again, you want to reach a thermal stability and not > oscillate around _PSV (or at least do so slowly. As there is probably > substantial variation between systems, so a settable hysteresis is > probably needed for really good results. (This gets worse for system > which don't support both throttling and frequencies.) _PSV should be implemented in acpi_thermal. It would control the frequency through CPUFREQ_GET/SET. That's one main reason why I added both user and kernel interfaces. acpi_thermal doesn't have to know what cpufreq devices are on the system, it just can use whatever is there. If you read the section of the ACPI spec on _PSV, you'll see it offers an equation and methods for the BIOS to signal the appropriate coefficients for getting good hysteresis. > And, should TCC be folded into the equation for P4 systems? After all, > that's what it's for. I dont; see any way to set TCC to automatic at the > moment, but that could be a significant tool in thermal stability. > (There may be a way, but I didn't see it in the sources.) I'm soon going to move p4tcc to be another relative cpufreq driver. It will be under manual control although the driver is free to implement some hidden ultimate limit via automatic control to keep the chip from melting. I think TCC already has that non-configurable feature in hw no matter what we do. Whatever the case, I think optional cpufreq management (i.e. powerd) should be done in usermode. This allows it to make complex decisions and link with lots of components (want to coordinate with a cluster over the network? sure!) If it crashes, the system just uses more power or is slow until a user restarts it. However, thermal or other emergency uses of cpufreq should be in the kernel and use the higher priorities so that the system doesn't melt down when a fan dies. -- Nate