From owner-freebsd-stable@FreeBSD.ORG  Sun Jul 20 23:16:57 2014
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 880CEE3D;
 Sun, 20 Jul 2014 23:16:57 +0000 (UTC)
Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35])
 by mx1.freebsd.org (Postfix) with ESMTP id 3AB2827B8;
 Sun, 20 Jul 2014 23:16:56 +0000 (UTC)
Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534)
 id 0E8D720E7088C; Sun, 20 Jul 2014 23:16:54 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
 smtp1.multiplay.co.uk
X-Spam-Level: **
X-Spam-Status: No, score=2.2 required=8.0 tests=AWL,BAYES_00,DOS_OE_TO_MX,
 FSL_HELO_NON_FQDN_1,HELO_NO_DOMAIN,RDNS_DYNAMIC,STOX_REPLY_TYPE autolearn=no
 version=3.3.1
Received: from r2d2 (82-69-141-170.dsl.in-addr.zen.co.uk [82.69.141.170])
 by smtp1.multiplay.co.uk (Postfix) with ESMTPS id 0DD2B20E7088A;
 Sun, 20 Jul 2014 23:16:50 +0000 (UTC)
Message-ID: <3E5D732C440140B9AEE204B91E5B120E@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "Jeremy Chadwick" <jdc@koitsu.org>,
	"Adrian Chadd" <adrian@freebsd.org>
References: <20140720062413.GA56318@icarus.home.lan>
 <97EA8E571E634DBBAA70F7AA7F0DE97C@multiplay.co.uk>
 <20140720173524.GA67065@icarus.home.lan>
 <ED826825202341E58B71A3F718B60562@multiplay.co.uk>
 <20140720201655.GA70545@icarus.home.lan>
 <CAJ-Vmo=O-OH-Ljk1u-SGF1L=jj=Lnwd_aQs0Bqc5Mi03jZvkuw@mail.gmail.com>
 <20140720225845.GA81033@icarus.home.lan>
Subject: Re: Consistently "high" CPU load on 10.0-STABLE
Date: Mon, 21 Jul 2014 00:16:48 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
 reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Cc: FreeBSD Stable Mailing List <freebsd-stable@freebsd.org>
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 20 Jul 2014 23:16:57 -0000


----- Original Message ----- 
From: "Jeremy Chadwick" <jdc@koitsu.org>
To: "Adrian Chadd" <adrian@freebsd.org>
Cc: "Steven Hartland" <killing@multiplay.co.uk>; "FreeBSD Stable Mailing List" <freebsd-stable@freebsd.org>
Sent: Sunday, July 20, 2014 11:58 PM
Subject: Re: Consistently "high" CPU load on 10.0-STABLE


> On Sun, Jul 20, 2014 at 03:09:55PM -0700, Adrian Chadd wrote:
>> hi,
>> 
>> it looks like a whole lot of things are waking up at the same time:
>> 
>> * dhcpd
>> * em
>> * usb devices
>> 
>> So, do you have some shared interrupts going on here? That seems to be
>> what's causing things to all wake up all at once.
> 
> I forget how to get an interrupt mapping from the I/O APIC, but dmesg
> indicates the following.  Sorted by IRQ order so that you can tell
> what's associated with what, and also RELENG_9 vs. RELENG_10 (because I
> do have an old dmesg.today from this box running RELENG_9).  All the
> IRQs match up:
> 
> dev       RELENG_10      RELENG_9
> --------  -------------  -------------
> ioapic0   IRQs  0 to 23  IRQs  0 to 23 (same)
> ioapic1   IRQs 24 to 47  IRQs 24 to 47 (same)
> attimer0  IRQ 0          IRQ 0 (same)
> atkbdc0   IRQ 1          IRQ 1 (same)
> atkbd0    IRQ 1          IRQ 1 (same)
> uart1     IRQ 3          IRQ 3 (same)
> uart0     IRQ 4          IRQ 4 (same)
> atrtc0    IRQ 8          IRQ 8 (same)
> em0       IRQ 16         IRQ 16 (same)
> pcib1     IRQ 16         IRQ 16 (same)
> pcib3     IRQ 16         IRQ 16 (same)
> pcib4     IRQ 16         IRQ 16 (same)
> uhci0     IRQ 16         IRQ 16 (same)
> ahci0     IRQ 17         IRQ 17 (same)
> em1       IRQ 17         IRQ 17 (same)
> ichsmb0   IRQ 17         IRQ 17 (same)
> pcib5     IRQ 17         IRQ 17 (same)
> uhci1     IRQ 17         IRQ 17 (same)
> ehci0     IRQ 18         IRQ 18 (same)
> uhci2     IRQ 18         IRQ 18 (same)
> uhci5     IRQ 18         IRQ 18 (same)
> siis0     IRQ 21         IRQ 21 (same)
> uhci4     IRQ 22         IRQ 22 (same)
> ehci1     IRQ 23         IRQ 23 (same)
> uhci3     IRQ 23         IRQ 23 (same)
> 
> And the higher-numbered IRQs per vmstat -i.  I only have this for
> RELENG_10 however:
> 
> irq256: em0                      1848856         26
> irq259: ahci0:ch0                 273086          3
> irq260: ahci0:ch1                   9990          0
> irq261: ahci0:ch2                  48514          0
> irq262: ahci0:ch3                  48046          0
> irq263: ahci0:ch4                  48258          0
> irq264: ahci0:ch5                  48052          0
> 
> vmstat -i for this is kinda painful (discussed this with jhb@ in the
> past, re: kernel just appending "+" to the string to indicate "many
> things using this IRQ").
> 
> I have absolute no USB devices attached to the system (meaning there are
> USB controllers and ports, yeah, but nothing attached to any of them).
> The keyboard is PS/2.  All disks are on ahci0 (no disks currently
> attached to siis0).
> 
> As for dhcpd: I don't know how that'd be responsible.  If I stop the
> process entirely I still see the problem.
> 
> I can provide some more ktrdumps, along with turning off as many daemons
> + cron jobs as I can, if you feel that'd be helpful.
> 
> Likewise I can provide an ACPI DSDT dump if that would be useful (maybe
> to someone else).
> 
> I haven't tried booting the box in single-user and letting it sit there
> to see if anything shows up there.
> 
> In the interim I wrote the perl script I mentioned in my mail to Steve.
> When the load shoots up, there is literally no field in "vmstat -s"
> that shows a humongous increase (or decrease) consistently.  Meaning
> I'd say 95% of the time when there's a sudden load jump, none of those
> statistics I can correlate with it.  It's a pretty "meh" script, but
> it does the job of showing deltas between vmstat -s runs and indicating
> visually when there's a jump in load average (1m avg).  It requires a
> VERY wide terminal (about 301 characters):
> 
> http://jdc.koitsu.org/freebsd/releng10_perf_issue/load_vmstat.pl
> 
> Some example output is here (obviously can't see the red+bold
> highlighting of the line):
> 
> http://jdc.koitsu.org/freebsd/releng10_perf_issue/example_data.txt
> 
> Load jumps at the following time indexes:
> 
> 124.0  (from 0.02 to 0.10, load delta: 0.08)
> 153.0  (from 0.06 to 0.14, load delta: 0.08, time delta: 29.0 sec)
> 178.5  (from 0.10 to 0.17, load delta: 0.07, time delta: 25.5 sec)
> 217.0  (from 0.09 to 0.17, load delta: 0.08, time delta: 38.5 sec)
> 236.0  (from 0.12 to 0.19, load delta: 0.07, time delta: 19.0 sec)
> 244.0  (from 0.17 to 0.24, load delta: 0.07, time delta:  8.0 sec)
> 259.0  (from 0.20 to 0.27, load delta: 0.07, time delta: 15.0 sec)
> 284.5  (from 0.19 to 0.25, load delta: 0.06, time delta: 25.5 sec)
> 310.0  (from 0.18 to 0.25, load delta: 0.07, time delta: 25.5 sec)
> 341.5  (from 0.27 to 0.33, load delta: 0.06, time delta: 31.5 sec)
> 
> Some of these could be due to cron jobs I run (though they really aren't
> that intensive on disk, CPU, or memory), but there's a pretty consistent
> pattern going on there load-wise.  The reason noted time deltas was
> watching for "periodic tasks", e.g. ZFS txg flush.  But this seems to
> have a little bit more variance.
> 
> It's just that none of the vmstat -s statistics change rapidly alongside
> the load.  But I'm sure there are VM bits that aren't tracked in vmstat.

Not sure if its in stable/10 but there was some talk about making
ZFS use lz4 for some things by default, wonder if that might have
something to do with it?

    Regards
    Steve