Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Nov 2008 16:06:24 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        Alexander Motin <mav@freebsd.org>
Cc:        Sam Leffler <sam@freebsd.org>, freebsd-mobile@freebsd.org
Subject:   Re: RFC: powerd algorithms enhancements
Message-ID:  <200811131606.24804.jhb@freebsd.org>
In-Reply-To: <491C9380.7050007@FreeBSD.org>
References:  <200811060901400000@466321507> <200811131145.39747.jhb@freebsd.org> <491C9380.7050007@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 13 November 2008 03:52:16 pm Alexander Motin wrote:
> John Baldwin wrote:
> >> If your system completely freezes at 400MHz, then it spends about 20% of 
> >> CPU time on this at 2GHz. Doesn't it?
> > 
> > Nope.  It is usually very idle at full speed.  You are free to go buy your 
own 
> > HP nc6220 if you want to see it for yourself.  You can also grab the KTR 
> > trace and modified schedgraph.py at www.freebsd.org/~jhb/gpe/.
> 
> It's very strange to me that you have 100% load at 400MHz, but zero at 
> full speed. It shouldn't be so!

I think systems are more complex than you give them credit for.  Imagine what 
CPU frequency changing does to SMI# handlers for example.

> Just an idea. I have noticed a problem, that my mobile Core2Duo does not 
> drops TSC timer frequency on EST. It confuses kernel time counting and 
> leads to incorrect proportional increasing of DELAY() times. I have 
> fixed this problem to myself with "kern.timecounter.invariant_tsc=1". 
> Can't it just be applicable to your CPU?

Very, very doubtful.  This is a Pentium-M, and I know that the TSC slows down, 
because until Nate's fixes to make DELAY() work correctly, the 5-second delay 
on shutdown used to take a lot longer than 5 seconds when I was on battery 
(after being on A/C).

> >>>> I think the only solutions for this case can be in allowing scheduler 
to 
> >>>> really do it's job. Or by moving _everything_ out of interrupt threads 
> >>>> to make them extremely fast and so to avoid the livelock problem, or in 
> >>>> some other way allow scheduler to delay interrupt processing to allow 
> >>>> other (for example user-level) threads to obtain at least some part of 
> >>>> their CPU time slot according to their priorities.
> > 
> > This is completely backwards.  Userland is not more important than 
interrupt 
> > handling in the kernel.  The problem is that CPU frequency handling is 
more 
> > important than relegating the entire task to userland.  Instead of 
completely 
> > breaking the entire userland/kernel model to get part of userland executed 
at 
> > a kernel-level priority so CPU frequency handling is partially handled at 
a 
> > kernel-level priority, why not just move the CPU frequency bits that need 
to 
> > be kernel-level into the kernel?  We already doing the thermal management 
for 
> > passive cooling in the kernel rather than in userland.
> 
> The fact of system livelocks means that interrupt processing works out 
> of any priorities! Saying that moving all processing into interrupt 
> handlers is a good way, you are saying that having _all_ our system out 
> of any priorities is a good idea. That's actually the situation we are 
> able to see now with heavy network load with polling disabled. System 
> just dies and there is no other way to manage that except enabling polling!
> 
> Heavy interrupt handlers is _evil_ from the scheduling point of view! It 
> may be faster in some situations, but it makes system unmanageable! 
> There are never will be enough power to fulfill all requirements, so we 
> must take care about the case when there will be more interrupts then we 
> are able to handle.

I'm not advocating moving the entire system into interrupt handlers.  Did you 
actually read what I wrote?  My point is that if you have something in 
userland that is as important as what gets done in interrupt handlers, the 
solution is to not rip up the entire scheduler to make certain bits of 
userland have a higher priority than interrupts.  The solution is to move the 
one bit of userland code that is needed into the kernel.  In this case I'm 
not suggesting moving all of powerd into an interrupt handler.  What I am 
suggesting is that the kernel needs a policy to consider raising the 
frequency when it gets an interrupt after being in a deep sleep.  If the 
power savings from C2/3/whatever are greater than running throttled, then it 
is much more ideal when you get an interrupt while idle that you run at full 
speed to service the interrupt and then return to C2/C3 ASAP rather than 
running the interrupt handler at a throttled speed and spending less time in 
C2/C3.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200811131606.24804.jhb>