From owner-freebsd-acpi@FreeBSD.ORG  Tue Feb 15 08:13:14 2005
Return-Path: <owner-freebsd-acpi@FreeBSD.ORG>
Delivered-To: freebsd-acpi@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 23CB816A4E9
	for <acpi@freebsd.org>; Tue, 15 Feb 2005 08:13:14 +0000 (GMT)
Received: from ylpvm01.prodigy.net (ylpvm01-ext.prodigy.net [207.115.57.32])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 95DCA43D48
	for <acpi@freebsd.org>; Tue, 15 Feb 2005 08:13:11 +0000 (GMT)
	(envelope-from nate@root.org)
Received: from [10.0.5.51] (adsl-64-171-186-189.dsl.snfc21.pacbell.net
	[64.171.186.189])j1F8D6vE005877;	Tue, 15 Feb 2005 03:13:06 -0500
Message-ID: <4211AF11.7050109@root.org>
Date: Tue, 15 Feb 2005 00:13:05 -0800
From: Nate Lawson <nate@root.org>
User-Agent: Mozilla Thunderbird 1.0RC1 (X11/20041205)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Kevin Oberman <oberman@es.net>
References: <20050214214422.C9F925D07@ptavv.es.net>
In-Reply-To: <20050214214422.C9F925D07@ptavv.es.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
cc: acpi@freebsd.org
Subject: Re: HEADSUP: cpufreq import complete, acpi_throttling changed
X-BeenThere: freebsd-acpi@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: ACPI and power management development <freebsd-acpi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-acpi>,
	<mailto:freebsd-acpi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-acpi>
List-Post: <mailto:freebsd-acpi@freebsd.org>
List-Help: <mailto:freebsd-acpi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-acpi>,
	<mailto:freebsd-acpi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Feb 2005 08:13:14 -0000

Kevin Oberman wrote:
>>Date: Mon, 14 Feb 2005 22:19:48 +0100
>>From: Pawel Worach <pawel.worach@telia.com>
>>
>>Hi, sorry for the delay. I bumped the number of retries to 2000 and I
>>can still repro. the error if the cpu has some load, I believe that is
>>expected. Even when "idle" (gnome desktop running) it works fine with
>>100, I think the first time I tested it I had mplayer running. I can't
>>see a real-life reason for bumping the number of retries, from all
>>speeds above 200Mhz I can step back up to 1.7Ghz without problems
>>under light cpu load. The power_profile script should probably have a
>>min limit, 75Mhz is ridiculous :)

Ok, no problem.

>>Another cool thing would be if the speed could be stepped
>>automagically based on current battery level, that would likely be the
>>job for a powerd(8).
> 
> 
> I've been things about this, too, 

Strangely, I've thought about this for 2 years.  ;-)

> and I think that stepping things down
> with battery level is not the answer. I think it MIGHT make sense to do
> so based on battery discharge rate. This would allow a user to configure
> an approximate battery "lifetime". It is especially important as
> batteries wear and, if two batteries are present, one discharges faster
> than the other.

I don't think battery level or discharge rate is a useful control input. 
  Think about if you have your laptop sitting on your desk for an hour, 
and then want to buildworld for an hour.  You certainly want the system 
to power everything possible down for the first hour and run as fast as 
possible the second hour.  The factor almost everyone gets wrong is the 
integral:

Nate Efficiency = Useful Work Done / Amp-hours burned
Total amp-hours = Sum[t: 0...Tdead](PowerUsed(t))

This formula explains why you should run the CPU at full speed whenever 
there is work to be done (increasing the numerator) and run it as low as 
possible when idle (decreasing the loss of the denominator).  The 
denominator is ultimately fixed (you only have so much battery).

On AC power, the denominator is infinite (meaning Nate is never very 
efficient).  In this case, thermal (and hence noise) issues become an 
issue.  You also want to keep your power bill low so conserving AC power 
is a minor but valid concern (think: server farm.)

I think the control inputs should be current system load and thermal 
level.  The system load control function would take the current 
instantaneous load, all previous measured loads, and current CPU freq 
level and output its desired frequency.  The thermal function would take 
into account temperatures of each zone, current active coolers, desired 
temperature, current CPU freq, and output its desired frequency.  There 
would be a weighting value that would allow the user to select which 
factor dominates the decision.  All this is a good research project.

Don't forget to throw in sleep states (i.e. S1 or S3) and disk spindown 
if you want to be complete.  The Linux "laptop mode" patches are a good 
example how to do some of this right, as well as iBook behavior.

> The other issue is thermal. I would assume that the frequency should be
> decreased when _PSV is reached, but should it continue to drop the
> frequency until the temperature stabilizes or until it drops to _PSV.  I
> believe the latter is a better choice, especially as the effect is not
> quite instantaneous, and since it is only read by ACPI at fairly long
> intervals. This means that the adjustment should not be too aggressive
> to prevent continual oscillation of the frequency and temperature.
> 
> And when do you start increasing the frequency again when temperature
> drops? Once again, you want to reach a thermal stability and not
> oscillate around _PSV (or at least do so slowly. As there is probably
> substantial variation between systems, so a settable hysteresis is
> probably needed for really good results. (This gets worse for system
> which don't support both throttling and frequencies.)

_PSV should be implemented in acpi_thermal.  It would control the 
frequency through CPUFREQ_GET/SET.  That's one main reason why I added 
both user and kernel interfaces.  acpi_thermal doesn't have to know what 
cpufreq devices are on the system, it just can use whatever is there.

If you read the section of the ACPI spec on _PSV, you'll see it offers 
an equation and methods for the BIOS to signal the appropriate 
coefficients for getting good hysteresis.

> And, should TCC be folded into the equation for P4 systems? After all,
> that's what it's for. I dont; see any way to set TCC to automatic at the
> moment, but that could be a significant tool in thermal stability.
> (There may be a way, but I didn't see it in the sources.)

I'm soon going to move p4tcc to be another relative cpufreq driver.  It 
will be under manual control although the driver is free to implement 
some hidden ultimate limit via automatic control to keep the chip from 
melting.  I think TCC already has that non-configurable feature in hw no 
matter what we do.

Whatever the case, I think optional cpufreq management (i.e. powerd) 
should be done in usermode.  This allows it to make complex decisions 
and link with lots of components (want to coordinate with a cluster over 
the network? sure!)  If it crashes, the system just uses more power or 
is slow until a user restarts it.  However, thermal or other emergency 
uses of cpufreq should be in the kernel and use the higher priorities so 
that the system doesn't melt down when a fan dies.

-- 
Nate