From owner-freebsd-stable@FreeBSD.ORG Sun Jul 20 23:16:57 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 880CEE3D; Sun, 20 Jul 2014 23:16:57 +0000 (UTC) Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35]) by mx1.freebsd.org (Postfix) with ESMTP id 3AB2827B8; Sun, 20 Jul 2014 23:16:56 +0000 (UTC) Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534) id 0E8D720E7088C; Sun, 20 Jul 2014 23:16:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.multiplay.co.uk X-Spam-Level: ** X-Spam-Status: No, score=2.2 required=8.0 tests=AWL,BAYES_00,DOS_OE_TO_MX, FSL_HELO_NON_FQDN_1,HELO_NO_DOMAIN,RDNS_DYNAMIC,STOX_REPLY_TYPE autolearn=no version=3.3.1 Received: from r2d2 (82-69-141-170.dsl.in-addr.zen.co.uk [82.69.141.170]) by smtp1.multiplay.co.uk (Postfix) with ESMTPS id 0DD2B20E7088A; Sun, 20 Jul 2014 23:16:50 +0000 (UTC) Message-ID: <3E5D732C440140B9AEE204B91E5B120E@multiplay.co.uk> From: "Steven Hartland" To: "Jeremy Chadwick" , "Adrian Chadd" References: <20140720062413.GA56318@icarus.home.lan> <97EA8E571E634DBBAA70F7AA7F0DE97C@multiplay.co.uk> <20140720173524.GA67065@icarus.home.lan> <20140720201655.GA70545@icarus.home.lan> <20140720225845.GA81033@icarus.home.lan> Subject: Re: Consistently "high" CPU load on 10.0-STABLE Date: Mon, 21 Jul 2014 00:16:48 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Cc: FreeBSD Stable Mailing List X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Jul 2014 23:16:57 -0000 ----- Original Message ----- From: "Jeremy Chadwick" To: "Adrian Chadd" Cc: "Steven Hartland" ; "FreeBSD Stable Mailing List" Sent: Sunday, July 20, 2014 11:58 PM Subject: Re: Consistently "high" CPU load on 10.0-STABLE > On Sun, Jul 20, 2014 at 03:09:55PM -0700, Adrian Chadd wrote: >> hi, >> >> it looks like a whole lot of things are waking up at the same time: >> >> * dhcpd >> * em >> * usb devices >> >> So, do you have some shared interrupts going on here? That seems to be >> what's causing things to all wake up all at once. > > I forget how to get an interrupt mapping from the I/O APIC, but dmesg > indicates the following. Sorted by IRQ order so that you can tell > what's associated with what, and also RELENG_9 vs. RELENG_10 (because I > do have an old dmesg.today from this box running RELENG_9). All the > IRQs match up: > > dev RELENG_10 RELENG_9 > -------- ------------- ------------- > ioapic0 IRQs 0 to 23 IRQs 0 to 23 (same) > ioapic1 IRQs 24 to 47 IRQs 24 to 47 (same) > attimer0 IRQ 0 IRQ 0 (same) > atkbdc0 IRQ 1 IRQ 1 (same) > atkbd0 IRQ 1 IRQ 1 (same) > uart1 IRQ 3 IRQ 3 (same) > uart0 IRQ 4 IRQ 4 (same) > atrtc0 IRQ 8 IRQ 8 (same) > em0 IRQ 16 IRQ 16 (same) > pcib1 IRQ 16 IRQ 16 (same) > pcib3 IRQ 16 IRQ 16 (same) > pcib4 IRQ 16 IRQ 16 (same) > uhci0 IRQ 16 IRQ 16 (same) > ahci0 IRQ 17 IRQ 17 (same) > em1 IRQ 17 IRQ 17 (same) > ichsmb0 IRQ 17 IRQ 17 (same) > pcib5 IRQ 17 IRQ 17 (same) > uhci1 IRQ 17 IRQ 17 (same) > ehci0 IRQ 18 IRQ 18 (same) > uhci2 IRQ 18 IRQ 18 (same) > uhci5 IRQ 18 IRQ 18 (same) > siis0 IRQ 21 IRQ 21 (same) > uhci4 IRQ 22 IRQ 22 (same) > ehci1 IRQ 23 IRQ 23 (same) > uhci3 IRQ 23 IRQ 23 (same) > > And the higher-numbered IRQs per vmstat -i. I only have this for > RELENG_10 however: > > irq256: em0 1848856 26 > irq259: ahci0:ch0 273086 3 > irq260: ahci0:ch1 9990 0 > irq261: ahci0:ch2 48514 0 > irq262: ahci0:ch3 48046 0 > irq263: ahci0:ch4 48258 0 > irq264: ahci0:ch5 48052 0 > > vmstat -i for this is kinda painful (discussed this with jhb@ in the > past, re: kernel just appending "+" to the string to indicate "many > things using this IRQ"). > > I have absolute no USB devices attached to the system (meaning there are > USB controllers and ports, yeah, but nothing attached to any of them). > The keyboard is PS/2. All disks are on ahci0 (no disks currently > attached to siis0). > > As for dhcpd: I don't know how that'd be responsible. If I stop the > process entirely I still see the problem. > > I can provide some more ktrdumps, along with turning off as many daemons > + cron jobs as I can, if you feel that'd be helpful. > > Likewise I can provide an ACPI DSDT dump if that would be useful (maybe > to someone else). > > I haven't tried booting the box in single-user and letting it sit there > to see if anything shows up there. > > In the interim I wrote the perl script I mentioned in my mail to Steve. > When the load shoots up, there is literally no field in "vmstat -s" > that shows a humongous increase (or decrease) consistently. Meaning > I'd say 95% of the time when there's a sudden load jump, none of those > statistics I can correlate with it. It's a pretty "meh" script, but > it does the job of showing deltas between vmstat -s runs and indicating > visually when there's a jump in load average (1m avg). It requires a > VERY wide terminal (about 301 characters): > > http://jdc.koitsu.org/freebsd/releng10_perf_issue/load_vmstat.pl > > Some example output is here (obviously can't see the red+bold > highlighting of the line): > > http://jdc.koitsu.org/freebsd/releng10_perf_issue/example_data.txt > > Load jumps at the following time indexes: > > 124.0 (from 0.02 to 0.10, load delta: 0.08) > 153.0 (from 0.06 to 0.14, load delta: 0.08, time delta: 29.0 sec) > 178.5 (from 0.10 to 0.17, load delta: 0.07, time delta: 25.5 sec) > 217.0 (from 0.09 to 0.17, load delta: 0.08, time delta: 38.5 sec) > 236.0 (from 0.12 to 0.19, load delta: 0.07, time delta: 19.0 sec) > 244.0 (from 0.17 to 0.24, load delta: 0.07, time delta: 8.0 sec) > 259.0 (from 0.20 to 0.27, load delta: 0.07, time delta: 15.0 sec) > 284.5 (from 0.19 to 0.25, load delta: 0.06, time delta: 25.5 sec) > 310.0 (from 0.18 to 0.25, load delta: 0.07, time delta: 25.5 sec) > 341.5 (from 0.27 to 0.33, load delta: 0.06, time delta: 31.5 sec) > > Some of these could be due to cron jobs I run (though they really aren't > that intensive on disk, CPU, or memory), but there's a pretty consistent > pattern going on there load-wise. The reason noted time deltas was > watching for "periodic tasks", e.g. ZFS txg flush. But this seems to > have a little bit more variance. > > It's just that none of the vmstat -s statistics change rapidly alongside > the load. But I'm sure there are VM bits that aren't tracked in vmstat. Not sure if its in stable/10 but there was some talk about making ZFS use lz4 for some things by default, wonder if that might have something to do with it? Regards Steve