FreeBSD Mail Archives

Date:      Mon, 21 Jul 2014 00:16:48 +0100
From:      "Steven Hartland" <killing@multiplay.co.uk>
To:        "Jeremy Chadwick" <jdc@koitsu.org>, "Adrian Chadd" <adrian@freebsd.org>
Cc:        FreeBSD Stable Mailing List <freebsd-stable@freebsd.org>
Subject:   Re: Consistently "high" CPU load on 10.0-STABLE
Message-ID:  <3E5D732C440140B9AEE204B91E5B120E@multiplay.co.uk>
References:  <20140720062413.GA56318@icarus.home.lan> <97EA8E571E634DBBAA70F7AA7F0DE97C@multiplay.co.uk> <20140720173524.GA67065@icarus.home.lan> <ED826825202341E58B71A3F718B60562@multiplay.co.uk> <20140720201655.GA70545@icarus.home.lan> <CAJ-Vmo=O-OH-Ljk1u-SGF1L=jj=Lnwd_aQs0Bqc5Mi03jZvkuw@mail.gmail.com> <20140720225845.GA81033@icarus.home.lan>

index | next in thread | previous in thread | raw e-mail



----- Original Message ----- 
From: "Jeremy Chadwick" <jdc@koitsu.org>
To: "Adrian Chadd" <adrian@freebsd.org>
Cc: "Steven Hartland" <killing@multiplay.co.uk>; "FreeBSD Stable Mailing List" <freebsd-stable@freebsd.org>
Sent: Sunday, July 20, 2014 11:58 PM
Subject: Re: Consistently "high" CPU load on 10.0-STABLE


> On Sun, Jul 20, 2014 at 03:09:55PM -0700, Adrian Chadd wrote:
>> hi,
>> 
>> it looks like a whole lot of things are waking up at the same time:
>> 
>> * dhcpd
>> * em
>> * usb devices
>> 
>> So, do you have some shared interrupts going on here? That seems to be
>> what's causing things to all wake up all at once.
> 
> I forget how to get an interrupt mapping from the I/O APIC, but dmesg
> indicates the following.  Sorted by IRQ order so that you can tell
> what's associated with what, and also RELENG_9 vs. RELENG_10 (because I
> do have an old dmesg.today from this box running RELENG_9).  All the
> IRQs match up:
> 
> dev       RELENG_10      RELENG_9
> --------  -------------  -------------
> ioapic0   IRQs  0 to 23  IRQs  0 to 23 (same)
> ioapic1   IRQs 24 to 47  IRQs 24 to 47 (same)
> attimer0  IRQ 0          IRQ 0 (same)
> atkbdc0   IRQ 1          IRQ 1 (same)
> atkbd0    IRQ 1          IRQ 1 (same)
> uart1     IRQ 3          IRQ 3 (same)
> uart0     IRQ 4          IRQ 4 (same)
> atrtc0    IRQ 8          IRQ 8 (same)
> em0       IRQ 16         IRQ 16 (same)
> pcib1     IRQ 16         IRQ 16 (same)
> pcib3     IRQ 16         IRQ 16 (same)
> pcib4     IRQ 16         IRQ 16 (same)
> uhci0     IRQ 16         IRQ 16 (same)
> ahci0     IRQ 17         IRQ 17 (same)
> em1       IRQ 17         IRQ 17 (same)
> ichsmb0   IRQ 17         IRQ 17 (same)
> pcib5     IRQ 17         IRQ 17 (same)
> uhci1     IRQ 17         IRQ 17 (same)
> ehci0     IRQ 18         IRQ 18 (same)
> uhci2     IRQ 18         IRQ 18 (same)
> uhci5     IRQ 18         IRQ 18 (same)
> siis0     IRQ 21         IRQ 21 (same)
> uhci4     IRQ 22         IRQ 22 (same)
> ehci1     IRQ 23         IRQ 23 (same)
> uhci3     IRQ 23         IRQ 23 (same)
> 
> And the higher-numbered IRQs per vmstat -i.  I only have this for
> RELENG_10 however:
> 
> irq256: em0                      1848856         26
> irq259: ahci0:ch0                 273086          3
> irq260: ahci0:ch1                   9990          0
> irq261: ahci0:ch2                  48514          0
> irq262: ahci0:ch3                  48046          0
> irq263: ahci0:ch4                  48258          0
> irq264: ahci0:ch5                  48052          0
> 
> vmstat -i for this is kinda painful (discussed this with jhb@ in the
> past, re: kernel just appending "+" to the string to indicate "many
> things using this IRQ").
> 
> I have absolute no USB devices attached to the system (meaning there are
> USB controllers and ports, yeah, but nothing attached to any of them).
> The keyboard is PS/2.  All disks are on ahci0 (no disks currently
> attached to siis0).
> 
> As for dhcpd: I don't know how that'd be responsible.  If I stop the
> process entirely I still see the problem.
> 
> I can provide some more ktrdumps, along with turning off as many daemons
> + cron jobs as I can, if you feel that'd be helpful.
> 
> Likewise I can provide an ACPI DSDT dump if that would be useful (maybe
> to someone else).
> 
> I haven't tried booting the box in single-user and letting it sit there
> to see if anything shows up there.
> 
> In the interim I wrote the perl script I mentioned in my mail to Steve.
> When the load shoots up, there is literally no field in "vmstat -s"
> that shows a humongous increase (or decrease) consistently.  Meaning
> I'd say 95% of the time when there's a sudden load jump, none of those
> statistics I can correlate with it.  It's a pretty "meh" script, but
> it does the job of showing deltas between vmstat -s runs and indicating
> visually when there's a jump in load average (1m avg).  It requires a
> VERY wide terminal (about 301 characters):
> 
> http://jdc.koitsu.org/freebsd/releng10_perf_issue/load_vmstat.pl
> 
> Some example output is here (obviously can't see the red+bold
> highlighting of the line):
> 
> http://jdc.koitsu.org/freebsd/releng10_perf_issue/example_data.txt
> 
> Load jumps at the following time indexes:
> 
> 124.0  (from 0.02 to 0.10, load delta: 0.08)
> 153.0  (from 0.06 to 0.14, load delta: 0.08, time delta: 29.0 sec)
> 178.5  (from 0.10 to 0.17, load delta: 0.07, time delta: 25.5 sec)
> 217.0  (from 0.09 to 0.17, load delta: 0.08, time delta: 38.5 sec)
> 236.0  (from 0.12 to 0.19, load delta: 0.07, time delta: 19.0 sec)
> 244.0  (from 0.17 to 0.24, load delta: 0.07, time delta:  8.0 sec)
> 259.0  (from 0.20 to 0.27, load delta: 0.07, time delta: 15.0 sec)
> 284.5  (from 0.19 to 0.25, load delta: 0.06, time delta: 25.5 sec)
> 310.0  (from 0.18 to 0.25, load delta: 0.07, time delta: 25.5 sec)
> 341.5  (from 0.27 to 0.33, load delta: 0.06, time delta: 31.5 sec)
> 
> Some of these could be due to cron jobs I run (though they really aren't
> that intensive on disk, CPU, or memory), but there's a pretty consistent
> pattern going on there load-wise.  The reason noted time deltas was
> watching for "periodic tasks", e.g. ZFS txg flush.  But this seems to
> have a little bit more variance.
> 
> It's just that none of the vmstat -s statistics change rapidly alongside
> the load.  But I'm sure there are VM bits that aren't tracked in vmstat.

Not sure if its in stable/10 but there was some talk about making
ZFS use lz4 for some things by default, wonder if that might have
something to do with it?

    Regards
    Steve

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E5D732C440140B9AEE204B91E5B120E>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation