Date: Mon, 21 Jul 2014 00:16:48 +0100 From: "Steven Hartland" <killing@multiplay.co.uk> To: "Jeremy Chadwick" <jdc@koitsu.org>, "Adrian Chadd" <adrian@freebsd.org> Cc: FreeBSD Stable Mailing List <freebsd-stable@freebsd.org> Subject: Re: Consistently "high" CPU load on 10.0-STABLE Message-ID: <3E5D732C440140B9AEE204B91E5B120E@multiplay.co.uk> References: <20140720062413.GA56318@icarus.home.lan> <97EA8E571E634DBBAA70F7AA7F0DE97C@multiplay.co.uk> <20140720173524.GA67065@icarus.home.lan> <ED826825202341E58B71A3F718B60562@multiplay.co.uk> <20140720201655.GA70545@icarus.home.lan> <CAJ-Vmo=O-OH-Ljk1u-SGF1L=jj=Lnwd_aQs0Bqc5Mi03jZvkuw@mail.gmail.com> <20140720225845.GA81033@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
----- Original Message ----- From: "Jeremy Chadwick" <jdc@koitsu.org> To: "Adrian Chadd" <adrian@freebsd.org> Cc: "Steven Hartland" <killing@multiplay.co.uk>; "FreeBSD Stable Mailing List" <freebsd-stable@freebsd.org> Sent: Sunday, July 20, 2014 11:58 PM Subject: Re: Consistently "high" CPU load on 10.0-STABLE > On Sun, Jul 20, 2014 at 03:09:55PM -0700, Adrian Chadd wrote: >> hi, >> >> it looks like a whole lot of things are waking up at the same time: >> >> * dhcpd >> * em >> * usb devices >> >> So, do you have some shared interrupts going on here? That seems to be >> what's causing things to all wake up all at once. > > I forget how to get an interrupt mapping from the I/O APIC, but dmesg > indicates the following. Sorted by IRQ order so that you can tell > what's associated with what, and also RELENG_9 vs. RELENG_10 (because I > do have an old dmesg.today from this box running RELENG_9). All the > IRQs match up: > > dev RELENG_10 RELENG_9 > -------- ------------- ------------- > ioapic0 IRQs 0 to 23 IRQs 0 to 23 (same) > ioapic1 IRQs 24 to 47 IRQs 24 to 47 (same) > attimer0 IRQ 0 IRQ 0 (same) > atkbdc0 IRQ 1 IRQ 1 (same) > atkbd0 IRQ 1 IRQ 1 (same) > uart1 IRQ 3 IRQ 3 (same) > uart0 IRQ 4 IRQ 4 (same) > atrtc0 IRQ 8 IRQ 8 (same) > em0 IRQ 16 IRQ 16 (same) > pcib1 IRQ 16 IRQ 16 (same) > pcib3 IRQ 16 IRQ 16 (same) > pcib4 IRQ 16 IRQ 16 (same) > uhci0 IRQ 16 IRQ 16 (same) > ahci0 IRQ 17 IRQ 17 (same) > em1 IRQ 17 IRQ 17 (same) > ichsmb0 IRQ 17 IRQ 17 (same) > pcib5 IRQ 17 IRQ 17 (same) > uhci1 IRQ 17 IRQ 17 (same) > ehci0 IRQ 18 IRQ 18 (same) > uhci2 IRQ 18 IRQ 18 (same) > uhci5 IRQ 18 IRQ 18 (same) > siis0 IRQ 21 IRQ 21 (same) > uhci4 IRQ 22 IRQ 22 (same) > ehci1 IRQ 23 IRQ 23 (same) > uhci3 IRQ 23 IRQ 23 (same) > > And the higher-numbered IRQs per vmstat -i. I only have this for > RELENG_10 however: > > irq256: em0 1848856 26 > irq259: ahci0:ch0 273086 3 > irq260: ahci0:ch1 9990 0 > irq261: ahci0:ch2 48514 0 > irq262: ahci0:ch3 48046 0 > irq263: ahci0:ch4 48258 0 > irq264: ahci0:ch5 48052 0 > > vmstat -i for this is kinda painful (discussed this with jhb@ in the > past, re: kernel just appending "+" to the string to indicate "many > things using this IRQ"). > > I have absolute no USB devices attached to the system (meaning there are > USB controllers and ports, yeah, but nothing attached to any of them). > The keyboard is PS/2. All disks are on ahci0 (no disks currently > attached to siis0). > > As for dhcpd: I don't know how that'd be responsible. If I stop the > process entirely I still see the problem. > > I can provide some more ktrdumps, along with turning off as many daemons > + cron jobs as I can, if you feel that'd be helpful. > > Likewise I can provide an ACPI DSDT dump if that would be useful (maybe > to someone else). > > I haven't tried booting the box in single-user and letting it sit there > to see if anything shows up there. > > In the interim I wrote the perl script I mentioned in my mail to Steve. > When the load shoots up, there is literally no field in "vmstat -s" > that shows a humongous increase (or decrease) consistently. Meaning > I'd say 95% of the time when there's a sudden load jump, none of those > statistics I can correlate with it. It's a pretty "meh" script, but > it does the job of showing deltas between vmstat -s runs and indicating > visually when there's a jump in load average (1m avg). It requires a > VERY wide terminal (about 301 characters): > > http://jdc.koitsu.org/freebsd/releng10_perf_issue/load_vmstat.pl > > Some example output is here (obviously can't see the red+bold > highlighting of the line): > > http://jdc.koitsu.org/freebsd/releng10_perf_issue/example_data.txt > > Load jumps at the following time indexes: > > 124.0 (from 0.02 to 0.10, load delta: 0.08) > 153.0 (from 0.06 to 0.14, load delta: 0.08, time delta: 29.0 sec) > 178.5 (from 0.10 to 0.17, load delta: 0.07, time delta: 25.5 sec) > 217.0 (from 0.09 to 0.17, load delta: 0.08, time delta: 38.5 sec) > 236.0 (from 0.12 to 0.19, load delta: 0.07, time delta: 19.0 sec) > 244.0 (from 0.17 to 0.24, load delta: 0.07, time delta: 8.0 sec) > 259.0 (from 0.20 to 0.27, load delta: 0.07, time delta: 15.0 sec) > 284.5 (from 0.19 to 0.25, load delta: 0.06, time delta: 25.5 sec) > 310.0 (from 0.18 to 0.25, load delta: 0.07, time delta: 25.5 sec) > 341.5 (from 0.27 to 0.33, load delta: 0.06, time delta: 31.5 sec) > > Some of these could be due to cron jobs I run (though they really aren't > that intensive on disk, CPU, or memory), but there's a pretty consistent > pattern going on there load-wise. The reason noted time deltas was > watching for "periodic tasks", e.g. ZFS txg flush. But this seems to > have a little bit more variance. > > It's just that none of the vmstat -s statistics change rapidly alongside > the load. But I'm sure there are VM bits that aren't tracked in vmstat. Not sure if its in stable/10 but there was some talk about making ZFS use lz4 for some things by default, wonder if that might have something to do with it? Regards Steve
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E5D732C440140B9AEE204B91E5B120E>