From owner-freebsd-hackers@FreeBSD.ORG Tue Feb 19 20:09:07 2008 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5733816A418; Tue, 19 Feb 2008 20:09:07 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from falcon.cybervisiontech.com (falcon.cybervisiontech.com [217.20.163.9]) by mx1.freebsd.org (Postfix) with ESMTP id C97FD13C448; Tue, 19 Feb 2008 20:09:06 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from localhost (localhost [127.0.0.1]) by falcon.cybervisiontech.com (Postfix) with ESMTP id 5674B744005; Tue, 19 Feb 2008 22:09:05 +0200 (EET) X-Virus-Scanned: Debian amavisd-new at falcon.cybervisiontech.com Received: from falcon.cybervisiontech.com ([127.0.0.1]) by localhost (falcon.cybervisiontech.com [127.0.0.1]) (amavisd-new, port 10027) with ESMTP id NaDqh34s+7pb; Tue, 19 Feb 2008 22:09:05 +0200 (EET) Received: from [10.74.70.239] (unknown [193.138.145.53]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by falcon.cybervisiontech.com (Postfix) with ESMTP id 19F81744003; Tue, 19 Feb 2008 22:09:03 +0200 (EET) Message-ID: <47BB375C.5010208@icyb.net.ua> Date: Tue, 19 Feb 2008 22:09:00 +0200 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.9 (X11/20071208) MIME-Version: 1.0 To: freebsd-acpi@freebsd.org, freebsd-hackers@freebsd.org References: <479F0ED4.9030709@icyb.net.ua> <479F62D9.6080703@root.org> <47A33CCB.3090902@icyb.net.ua> <47B0C10F.6000109@icyb.net.ua> <47B4103A.6090902@icyb.net.ua> <47B4A103.7040801@icyb.net.ua> <47B4B31A.4020605@icyb.net.ua> <47B84E61.3060401@icyb.net.ua> In-Reply-To: <47B84E61.3060401@icyb.net.ua> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Subject: Re: cx_lowest and CPU usage X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Feb 2008 20:09:07 -0000 More news about the issue. Finally, I stopped being blind and tried to use KTR. Graphs produces by schedgraph confirmed my suspicion that system spends most if its time in idle thread (sleeping). But, but, RTC interrupt hits with high precision into the relatively very short intervals when swi4 runs. It seems that relatively long transition times of C2 on this hardware cause this "synchronization" but I am not sure exactly how. It is even harder to say how long does it take the hardware (chipset+cpu) to wake from that sleep on interrupt. One interesting thing that I immediately noticed is that having DUMMYNET in my kernel caused a huge difference. A typical KTR sequence looks like this: 1. we are in idle thread (apparently sleeping in C2) 2. clock (IRQ0) interrupt handler is executed (apparently after CPU is waken and idle thread enabled interrupts after post-C2 processing) 3. hardclock() checks if there is any timeout to be processed by softclock, and thanks to dummynet, we have its callout scheduled each tick 4. hardclock() schedules swi4 to run I am fuzzy about the following step, I see traces, but I don't know the code. So I write my guess and knowledgeable people can correct me. 5. near the end of interrupt handler routine we have critical_exit() call, it can call mi_switch() if current thread needs to be preempted by a different thread; in this case interrupt code is executed in context of the system idle thread and the idle thread is "owing preemption" to swi4 (see step4) 6. so apparently mi_switch happens in the idle thread and execution jumps to swi4 thread, which now exits from mi_switch() in soft interrupt loop routine 7. so far, starting from step 2, execution is with disabled interrupts; now swi4 unlocks spin mutex lock and that enables interrupts (see the previous message quoted below) 8. "immediately", RTC (IRQ8) interrupt handler is executed, apparently it has been pending for some time now 9. statclock is called and because we are on a swi thread this rtc tick is charged to "interrupts" So, here we are. But I am still not sure why the above happens so "deterministically" - RTC interrupt can happen anywhere between 2 clock interrupts. So we have 1000us range for RTC and 90us of C2 exit delay, that shouldn't be enough to always put IRQ0 and IRQ8 "together". Anyway, I excluded DUMMYNET from kernel config, but kept SCHED_ULE and HZ=1000. Now with C2 enabled I get 15-20% of (reported!) interrupt load on completely idle system. That is still a lot, but is much less than 99% I had before :-) Now, without DUMMYNET, I will get new KTR dumps for cx_lowest=C1 and cx_lowest=C2. I hope that this much cleaner experiment would give more insights. But even now we can be sure that our cpu statistics code is not always adequate. This is because IRQ8 is not processed like some super-interrupt that looks on a system from above. It is a regular participant of the system, it competes with IRQ0 and is affected by scheduler's decisions. Especially those made while in interrupt handler. on 17/02/2008 17:10 Andriy Gapon said the following: > on 14/02/2008 23:31 Andriy Gapon said the following: >> I ran a series of tests, repeating each twice to be sure that I didn't >> make any mistake. >> All tests were performed in single-user mode, so the system was as idle >> as possible, top reported idle as 99.N% - 100%. >> Then I set hw.acpi.cpu.cx_lowest=C2 and ran top again. >> Here's the results: >> GENERIC, SCHED_4BSD, default HZ=1000 ==> C2-interrupt 11-14% (!!) >> GENERIC, SCHED_4BSD, kern.hz="100" ==> C2-interrupt 99-100% >> customized kernel, SCHED_ULE, default HZ=1000 ==> C2-interrupt 99-100% >> customized kernel, SCHED_ULE, kern.hz="100" ==> C2-interrupt 99-100% > > I did some testing and here are some additional data. > Running with sched_ule and hz=1000 I measured that mean time spent in C2 > state (from starting to go into it, till completely returning from it) > is ~860us (end_time - start_time, no overhead adjustments). > Additionally, 98.1% of sleeps lasted 800-900us, 1.8% lasted 700-800us, > the rest is spread almost uniformly over the remaining intervals of > range 0-1000%. > > Here's a typical backtrace for statclock execution: > #9 0xc04cce26 in statclock (usermode=0) at > /system/src/sys/kern/kern_clock.c:447 > #10 0xc06b5f0c in rtcintr (frame=0xcfb30c78) at > /system/src/sys/i386/isa/clock.c:236 > #11 0xc06a5275 in intr_execute_handlers (isrc=0xc0736900, > frame=0xcfb30c78) at /system/src/sys/i386/i386/intr_machdep.c:362 > #12 0xc06b558c in atpic_handle_intr (vector=8, frame=0xcfb30c78) at > /system/src/sys/i386/isa/atpic.c:596 > #13 0xc06a0d71 in Xatpic_intr8 () at atpic_vector.s:70 > #14 0xc06a70c9 in spinlock_exit () at cpufunc.h:365 > #15 0xc04e6993 in ithread_loop (arg=0xc22f5980) at > /system/src/sys/kern/kern_intr.c:1137 > #16 0xc04e32c1 in fork_exit (callout=0xc04e6640 , > arg=0xc22f5980, frame=0xcfb30d38) at /system/src/sys/kern/kern_fork.c:754 > #17 0xc06a0bc0 in fork_trampoline () at > /system/src/sys/i386/i386/exception.s:205 > > I.e. swi4 kernel thread is "caught" by IRQ8 just when it exits from > thread_unlock() in the following snippet: > 1132 if (!ithd->it_need && !(ithd->it_flags & IT_DEAD)) { > 1133 TD_SET_IWAIT(td); > 1134 ie->ie_count = 0; > 1135 mi_switch(SW_VOL, NULL); > 1136 } > 1137 thread_unlock(td); > > This happens almost deterministically. > -- Andriy Gapon