Date: Fri, 28 Dec 2007 13:38:13 +1100 (EST) From: Ian Smith <smithi@nimnet.asn.au> To: John Baldwin <jhb@freebsd.org> Cc: acpi@freebsd.org, njl@freebsd.org Subject: Re: An issue with powerd.. Message-ID: <Pine.BSF.3.96.1071228125242.11357A-100000@gaia.nimnet.asn.au> In-Reply-To: <200712271449.58285.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 27 Dec 2007, John Baldwin wrote: > So I've had some issues where I get weird hangs when I run with powerd enabled > on my laptop and I think I've finally tracked it down. Note that this is on > an older current with the older EC code, but the design flaw in powerd is > still relevant even if the new EC code makes my laptop happier (I'm trying to > update my laptop to more recent HEAD, but there's some weird scheduling bug > that I haven't fixed yet in newer stuff). > > Anyways, I was trying to debug the weird hangs I had when running with powerd > (machine would go unresponsive and fans would spin up, and after a variable > number of seconds it would come back and all the pending input (mouse > movements, keypresses, etc.) would be processed). I added some code to track > how long it takes for GPE's to run that would print out on the console if one > took more than 750ms as I had a feeling that something with ACPI was making > the system busy. Fans spinning up is perhaps interesting? As noted in my recent whinge about lack of component documentation, I've yet to suss out interactions between acpi_thermal (wrt both fans and passive cooling itself modifying cpu freqs - could this fight with powerd?), devd and other subsystems. Yeah I'm slowly beating through the ACPI spec, up to page 46 of >600 pages, but it's reminiscent of reading govt legislation .. I'd love to find the ~50 page precis, then I may be better able to follow some code. > It was also far worse in console mode than in X. In console mode I found that > sometimes the system would never "come back". Presumably X itself keeps it busy enough to keep cpu freq 'reasonable'? I use gkrellm to keep an eye on cpu freq, temp, load avg .. but my T23 is only a two-speed, min 733MHZ, so I can't see what you're seeing (and that's my faster laptop :) > So I was running in console mode recently with my timing patches and noticed > that when it hung it started warning about GPE events taking several > _seconds_ to process, e.g. 2-3 seconds, or in some cases up to _30_ seconds. > So, my theory is that powerd has lowered my CPU all the way down to 100mhz > (easy to reproduce in non-X, just let the box sit with no apps running) and > that for some reason the machine ends up in a "GPE storm" where it is > spending all its time handling GPE's and never has any CPU left for userland > apps (due to being at 100mhz). The problem then is that powerd never runs to > bump my CPU up to some reasonable speed. One workaround some have noted using is to set debug.cpufreq.lowest to some value considerably higher than 100MHz, say >500MHz to maintain reasonable responsiveness, at a cost of higher power use when idle. > In fact, anytime a completely idle box suddently gets a lot of kernel work > (e.g. a sudden flow of packets) it could in theory end up trying to handle > all this work at the reduced speed since the work has a higher priority than > the powerd process. To that end, I think that at least part of powerd needs > to be in the kernel, or at least that the kernel should be more proactive > about bumping the speed up when it resumes from Cx due to an interrupt. A > simple policy would be to bump up to full speed for any non-clock interrupt > (possibly bumping up for a clock interrupt if we wake up softclock as well). > > Thoughts? Just humble grasshopper droppings, master .. but the default powerd polling interval is 500ms, which is a really long time on a fast box, so -p 100 or even less might make a considerable difference? Can't comment on any in-kernel component, but responding per any sort of single interrupt/s sounds way too triggerhappy compared to monitoring load, assuming that such as vm.loadavg and kern.cp_time are themselves updated promptly in high-stress times? AU$0.02, which rounds down to 0 since we abandoned coins less than 5c .. cheers, Ian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.96.1071228125242.11357A-100000>