Date: Tue, 27 Oct 2015 04:44:43 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 173541] load average 0.60 at 100% idle Message-ID: <bug-173541-8-zLVENpgOiZ@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-173541-8@https.bugs.freebsd.org/bugzilla/> References: <bug-173541-8@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=173541 Jeremy Chadwick <jdc@koitsu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jdc@koitsu.org, | |mav@FreeBSD.org --- Comment #8 from Jeremy Chadwick <jdc@koitsu.org> --- (Warning: below is a long comment with a large amount of detail. I am CC'ing mav@ because I believe this may have to do with his eventtimer changes made sometime in 2012[1]. I'll provide some information that may greatly help narrow down the root cause near the end). I reported this problem, specific to stable/10 (i.e. the issue I see may be different than the one reported here), back in June 2014[2]. A visual representation of the difference between stable/10 and stable/9 on said hardware (bare-metal, not virtualised) is available[3]. The system in question still runs stable/9 (r289992) today without any load oddities. All of my KTR details and so on from that system are available here[4], where Adrian Chadd attempted to help me figure out what was going on. The mail thread (see [1] again) consisted of many people chiming in with useful details. At some point during this thread, maybe not publicly (I would need to review my mail archives), someone (John Baldwin? I'm not certain, my memory is not as good as it used to be) informed me that stable/10 had its load calculation algorithm changed. Since July 2014, I have been able to reproduce this problem with stable/10 on several different types of systems: - Supermicro X7SBA (bare metal) - VMware Workstation 11.x (FreeBSD stable/10 is a guest) - VPS provider Vultr.com (using QEMU) (FreeBSD stable/10 is a guest) There are, however, some individuals running stable/10 (e.g. a friend of mine using FreeNAS) on bare-metal hardware that do not experience this problem. Today I spent some time on a stable/10 (r289827) guest under VMware Workstation 11.x where I was able to reproduce the issue and found something surprising: kern.eventtimer.periodic=1 completely alleviates the problem -- the instant this sysctl is set to 1 ("periodic mode") the load begins to plummet and eventually hits 0.00, only increasing when there is "real" load. You can even set kern.eventtimer.periodic=0 ("one-shot") maybe 5-6 minutes after that and still see the system load at 0.00, although over the next 10+ minutes it will begin to gradually increase again. The downside/problem to using "periodic mode": vmstat -i shows a cpuX:timer rate of almost 2000 on both CPU cores, which if I understand eventtimers(4) correctly, would indicate that extremely high interrupt usage (in my case, LAPIC, i.e. local APIC). I just find it very interesting that somehow kern.eventtimer.periodic=1 is able to alleviate the issue. This leads me to wonder if the issue is with eventtimers(4) changes, or with the load calculation algorithm (or some "intermediary" piece that isn't taking into consideration certain things). On the VM in question, the eventtimer choices are: root@test-freebsd:~ # sysctl -a | grep kern.eventtimer kern.eventtimer.periodic: 0 kern.eventtimer.timer: LAPIC kern.eventtimer.idletick: 0 kern.eventtimer.singlemul: 2 kern.eventtimer.choice: LAPIC(600) i8254(100) RTC(0) kern.eventtimer.et.RTC.quality: 0 kern.eventtimer.et.RTC.frequency: 32768 kern.eventtimer.et.RTC.flags: 17 kern.eventtimer.et.i8254.quality: 100 kern.eventtimer.et.i8254.frequency: 1193182 kern.eventtimer.et.i8254.flags: 1 kern.eventtimer.et.LAPIC.quality: 600 kern.eventtimer.et.LAPIC.frequency: 33000101 kern.eventtimer.et.LAPIC.flags: 7 On a Vultr.com VPS (QEMU), the eventtimer choices are (according to stable/9, which I use there): kern.eventtimer.singlemul: 2 kern.eventtimer.idletick: 0 kern.eventtimer.activetick: 1 kern.eventtimer.timer: LAPIC kern.eventtimer.periodic: 0 kern.eventtimer.et.i8254.quality: 100 kern.eventtimer.et.i8254.frequency: 1193182 kern.eventtimer.et.i8254.flags: 1 kern.eventtimer.et.RTC.quality: 0 kern.eventtimer.et.RTC.frequency: 32768 kern.eventtimer.et.RTC.flags: 17 kern.eventtimer.et.LAPIC.quality: 400 kern.eventtimer.et.LAPIC.frequency: 500017353 kern.eventtimer.et.LAPIC.flags: 15 kern.eventtimer.choice: LAPIC(400) i8254(100) RTC(0) And finally on the aforementioned Supermicro X7SBA box, the eventtimer choices (per stable/9) are: kern.eventtimer.singlemul: 2 kern.eventtimer.idletick: 0 kern.eventtimer.activetick: 1 kern.eventtimer.timer: LAPIC kern.eventtimer.periodic: 0 kern.eventtimer.et.i8254.quality: 100 kern.eventtimer.et.i8254.frequency: 1193182 kern.eventtimer.et.i8254.flags: 1 kern.eventtimer.et.RTC.quality: 0 kern.eventtimer.et.RTC.frequency: 32768 kern.eventtimer.et.RTC.flags: 17 kern.eventtimer.et.LAPIC.quality: 400 kern.eventtimer.et.LAPIC.frequency: 166681842 kern.eventtimer.et.LAPIC.flags: 15 kern.eventtimer.choice: LAPIC(400) i8254(100) RTC(0) Per hpet(4) I have tried the following in /boot/loader.conf on all of the above VM systems to no avail (probably because they do not use HPET): hint.hpet.X.legacy_route=1 hint.attimer.0.clock=0 hint.atrtc.0.clock=0 Another thing I found interesting on stable/10 under the VMware VM was that this message is printed: Event timer "HPET" frequency 14318180 Hz quality 550 Yet the eventtimer is no where to be found. Anyway, this problem has come up *repeatedly* on multiple FreeBSD mailing lists and on the forums yet has gotten no traction[5][6][7][8][9][10]. I feel, respectively, that this has gone on long enough. This problem greatly diminishes the feasibility of using FreeBSD as a server operating system (when you can no longer trust or rely on the load average, what can you believe?). End-users and even skilled/senior administrators do not know how to diagnose or troubleshoot + rectify this problem given where it lies in kernel space. Please remember: the problem is 100% reproducible. I am happy to pay for a low-end Vultr.com dedicated VPS (includes console access as long as you have a web browser w/ Javascript) and/or set up a dedicated VMware Workstation VM (w/ remote VNC console) for any developer which wants to take it on, on my own dime. I cannot provide the aforementioned Supermicro X7SBA box for testing (it's my FreeBSD box and must remain usable). Alexander (mav@) -- what can we provide you that can help narrow this down? P.S. -- The Importance field needs to be changed from Affects Only Me to Affects Some People, as this is definitely not specific to just Viktor. Also, I would strongly suggest changing Depends On to reference bug 192315, which references eventtimer issues (but may be a different problem -- I'll let mav@ decide). [1]: https://lists.freebsd.org/pipermail/freebsd-net/2012-April/031893.html [2]: https://lists.freebsd.org/pipermail/freebsd-stable/2014-July/079386.html [3]: http://jdc.koitsu.org/freebsd/releng10_perf_issue/loadavg_comparison.png [4]: http://jdc.koitsu.org/freebsd/releng10_perf_issue/ [5]: https://lists.freebsd.org/pipermail/freebsd-virtualization/2014-October/002835.html [6]: https://forums.freebsd.org/threads/high-idle-load-in-vps.52688/ [7]: https://forums.freebsd.org/threads/high-cpu-utilization-by-average-load-values-while-no-processes-in-top-output-have-high-cpu-load.48933/ [8]: https://lists.freebsd.org/pipermail/freebsd-bugs/2013-January/051385.html [9]: https://lists.freebsd.org/pipermail/freebsd-xen/2014-January/002006.html [10]: https://forums.freebsd.org/threads/high-load-average-with-idle-state.38757/ -- You are receiving this mail because: You are the assignee for the bug.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-173541-8-zLVENpgOiZ>