From owner-freebsd-hackers@FreeBSD.ORG Wed Feb 20 18:34:01 2008 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3B99B16A402; Wed, 20 Feb 2008 18:34:01 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from falcon.cybervisiontech.com (falcon.cybervisiontech.com [217.20.163.9]) by mx1.freebsd.org (Postfix) with ESMTP id 846E413C447; Wed, 20 Feb 2008 18:34:00 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from localhost (localhost [127.0.0.1]) by falcon.cybervisiontech.com (Postfix) with ESMTP id 3033F43CA06; Wed, 20 Feb 2008 20:33:59 +0200 (EET) X-Virus-Scanned: Debian amavisd-new at falcon.cybervisiontech.com Received: from falcon.cybervisiontech.com ([127.0.0.1]) by localhost (falcon.cybervisiontech.com [127.0.0.1]) (amavisd-new, port 10027) with ESMTP id mkO68n+gBezk; Wed, 20 Feb 2008 20:33:59 +0200 (EET) Received: from [10.74.70.239] (unknown [193.138.145.53]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by falcon.cybervisiontech.com (Postfix) with ESMTP id DB83D43C9C9; Wed, 20 Feb 2008 20:33:57 +0200 (EET) Message-ID: <47BC7287.6000301@icyb.net.ua> Date: Wed, 20 Feb 2008 20:33:43 +0200 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.9 (X11/20071208) MIME-Version: 1.0 To: freebsd-acpi@freebsd.org, freebsd-hackers@freebsd.org References: <479F0ED4.9030709@icyb.net.ua> <479F62D9.6080703@root.org> <47A33CCB.3090902@icyb.net.ua> <47B0C10F.6000109@icyb.net.ua> <47B4103A.6090902@icyb.net.ua> <47B4A103.7040801@icyb.net.ua> <47B4B31A.4020605@icyb.net.ua> <47B84E61.3060401@icyb.net.ua> <47BB375C.5010208@icyb.net.ua> <47BB4D5C.9000406@icyb.net.ua> In-Reply-To: <47BB4D5C.9000406@icyb.net.ua> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Cc: Subject: Re: cx_lowest and CPU usage X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Feb 2008 18:34:01 -0000 on 19/02/2008 23:42 Andriy Gapon said the following: > The last result most probably means that RTC IRQ was not the interrupt > to wake CPU from sleeping state. > The first possibility that comes to mind is that on this particular > hardware RTC interrupt (IRQ8) is not able to wake the system from C2 state. So it seems that this was true. Here's a shortcut to the relevant info: PIIX4E (FW82371EB) specification DEVACTB — DEVICE ACTIVITY B (FUNCTION 3) pci register description BRLD_EN_IRQ8, bit 5 $ pciconf -r pci0:0:7:3 0x58 03040c07 > Is this possible? > Are there known instances of such breakage? > Are there other possible explanations? > > on 19/02/2008 22:09 Andriy Gapon said the following: >> More news about the issue. Finally, I stopped being blind and tried to >> use KTR. >> Graphs produces by schedgraph confirmed my suspicion that system spends >> most if its time in idle thread (sleeping). >> But, but, RTC interrupt hits with high precision into the relatively >> very short intervals when swi4 runs. >> It seems that relatively long transition times of C2 on this hardware >> cause this "synchronization" but I am not sure exactly how. It is even >> harder to say how long does it take the hardware (chipset+cpu) to wake >> from that sleep on interrupt. >> One interesting thing that I immediately noticed is that having DUMMYNET >> in my kernel caused a huge difference. >> A typical KTR sequence looks like this: >> 1. we are in idle thread (apparently sleeping in C2) >> 2. clock (IRQ0) interrupt handler is executed (apparently after CPU is >> waken and idle thread enabled interrupts after post-C2 processing) >> 3. hardclock() checks if there is any timeout to be processed by >> softclock, and thanks to dummynet, we have its callout scheduled each tick >> 4. hardclock() schedules swi4 to run >> >> I am fuzzy about the following step, I see traces, but I don't know the >> code. So I write my guess and knowledgeable people can correct me. >> 5. near the end of interrupt handler routine we have critical_exit() >> call, it can call mi_switch() if current thread needs to be preempted by >> a different thread; in this case interrupt code is executed in context >> of the system idle thread and the idle thread is "owing preemption" to >> swi4 (see step4) >> 6. so apparently mi_switch happens in the idle thread and execution >> jumps to swi4 thread, which now exits from mi_switch() in soft interrupt >> loop routine >> 7. so far, starting from step 2, execution is with disabled interrupts; >> now swi4 unlocks spin mutex lock and that enables interrupts (see the >> previous message quoted below) >> 8. "immediately", RTC (IRQ8) interrupt handler is executed, apparently >> it has been pending for some time now >> 9. statclock is called and because we are on a swi thread this rtc tick >> is charged to "interrupts" >> >> So, here we are. But I am still not sure why the above happens so >> "deterministically" - RTC interrupt can happen anywhere between 2 clock >> interrupts. So we have 1000us range for RTC and 90us of C2 exit delay, >> that shouldn't be enough to always put IRQ0 and IRQ8 "together". >> >> Anyway, I excluded DUMMYNET from kernel config, but kept SCHED_ULE and >> HZ=1000. Now with C2 enabled I get 15-20% of (reported!) interrupt load >> on completely idle system. That is still a lot, but is much less than >> 99% I had before :-) >> >> Now, without DUMMYNET, I will get new KTR dumps for cx_lowest=C1 and >> cx_lowest=C2. I hope that this much cleaner experiment would give more >> insights. >> >> But even now we can be sure that our cpu statistics code is not always >> adequate. This is because IRQ8 is not processed like some >> super-interrupt that looks on a system from above. It is a regular >> participant of the system, it competes with IRQ0 and is affected by >> scheduler's decisions. Especially those made while in interrupt handler. >> >> >> >> on 17/02/2008 17:10 Andriy Gapon said the following: >>> on 14/02/2008 23:31 Andriy Gapon said the following: >>>> I ran a series of tests, repeating each twice to be sure that I didn't >>>> make any mistake. >>>> All tests were performed in single-user mode, so the system was as idle >>>> as possible, top reported idle as 99.N% - 100%. >>>> Then I set hw.acpi.cpu.cx_lowest=C2 and ran top again. >>>> Here's the results: >>>> GENERIC, SCHED_4BSD, default HZ=1000 ==> C2-interrupt 11-14% (!!) >>>> GENERIC, SCHED_4BSD, kern.hz="100" ==> C2-interrupt 99-100% >>>> customized kernel, SCHED_ULE, default HZ=1000 ==> C2-interrupt 99-100% >>>> customized kernel, SCHED_ULE, kern.hz="100" ==> C2-interrupt 99-100% >>> I did some testing and here are some additional data. >>> Running with sched_ule and hz=1000 I measured that mean time spent in C2 >>> state (from starting to go into it, till completely returning from it) >>> is ~860us (end_time - start_time, no overhead adjustments). >>> Additionally, 98.1% of sleeps lasted 800-900us, 1.8% lasted 700-800us, >>> the rest is spread almost uniformly over the remaining intervals of >>> range 0-1000%. >>> >>> Here's a typical backtrace for statclock execution: >>> #9 0xc04cce26 in statclock (usermode=0) at >>> /system/src/sys/kern/kern_clock.c:447 >>> #10 0xc06b5f0c in rtcintr (frame=0xcfb30c78) at >>> /system/src/sys/i386/isa/clock.c:236 >>> #11 0xc06a5275 in intr_execute_handlers (isrc=0xc0736900, >>> frame=0xcfb30c78) at /system/src/sys/i386/i386/intr_machdep.c:362 >>> #12 0xc06b558c in atpic_handle_intr (vector=8, frame=0xcfb30c78) at >>> /system/src/sys/i386/isa/atpic.c:596 >>> #13 0xc06a0d71 in Xatpic_intr8 () at atpic_vector.s:70 >>> #14 0xc06a70c9 in spinlock_exit () at cpufunc.h:365 >>> #15 0xc04e6993 in ithread_loop (arg=0xc22f5980) at >>> /system/src/sys/kern/kern_intr.c:1137 >>> #16 0xc04e32c1 in fork_exit (callout=0xc04e6640 , >>> arg=0xc22f5980, frame=0xcfb30d38) at /system/src/sys/kern/kern_fork.c:754 >>> #17 0xc06a0bc0 in fork_trampoline () at >>> /system/src/sys/i386/i386/exception.s:205 >>> >>> I.e. swi4 kernel thread is "caught" by IRQ8 just when it exits from >>> thread_unlock() in the following snippet: >>> 1132 if (!ithd->it_need && !(ithd->it_flags & IT_DEAD)) { >>> 1133 TD_SET_IWAIT(td); >>> 1134 ie->ie_count = 0; >>> 1135 mi_switch(SW_VOL, NULL); >>> 1136 } >>> 1137 thread_unlock(td); >>> >>> This happens almost deterministically. >>> >> > > -- Andriy Gapon