From owner-freebsd-stable@FreeBSD.ORG Sun Jul 20 08:54:40 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BA1B634C for ; Sun, 20 Jul 2014 08:54:40 +0000 (UTC) Received: from qmta05.emeryville.ca.mail.comcast.net (qmta05.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:48]) by mx1.freebsd.org (Postfix) with ESMTP id 97C0A20B2 for ; Sun, 20 Jul 2014 08:54:40 +0000 (UTC) Received: from omta17.emeryville.ca.mail.comcast.net ([76.96.30.73]) by qmta05.emeryville.ca.mail.comcast.net with comcast id UYta1o0011afHeLA5Yuf9G; Sun, 20 Jul 2014 08:54:39 +0000 Received: from jdc.koitsu.org ([69.181.136.108]) by omta17.emeryville.ca.mail.comcast.net with comcast id UYue1o00B2LW5AV8dYuftX; Sun, 20 Jul 2014 08:54:39 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 9BE0D1744B1; Sun, 20 Jul 2014 01:54:38 -0700 (PDT) Date: Sun, 20 Jul 2014 01:54:38 -0700 From: Jeremy Chadwick To: Adrian Chadd Subject: Re: Consistently "high" CPU load on 10.0-STABLE Message-ID: <20140720085438.GA59359@icarus.home.lan> References: <20140720062413.GA56318@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20140121; t=1405846479; bh=4CXIcZfFNPrhDp3tW2PtX7HxZLB8GwknY2PYYMvmI7g=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=qO9D7KMAokWtwUtR9UfxaeS9irDmQ/laKy1Y/x1n6mjajVjHV9vTguZtlr2jPfeeQ WX/t5fnIHwsFIp1Y0FZXTugeFXPgb7TClgGcsMC4xESbMtSGJpQQf9VXPfl8F6AipJ fkknDUpzQ5ziQsNkY65rEp59u1MwWGTvimXtzkXXQcOsRrSCJw1IPd1334mx0xuKBc dI/9xzYgAp8/QWOzbzOpkCeyKm2N98WBW8Gr8ZL0pd3TEQ3U1xrMavQwnjNyOzVPxV X4ISh3Z8X5Oy0e3n+GaUAJ1omZ1XG/A/r8sLMxeLw3ZOJEDZoAyN+tYOhAJ3q7gSwq 8JIxOfRwqTIvQ== Cc: FreeBSD Stable Mailing List X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 20 Jul 2014 08:54:40 -0000 Okay, so I had to install FreeBSD on a VM (on a different box) to deal with Xorg and all that jazz, then did the following on the bare-metal system: - Let it sit idle (barring cronjobs and existing daemons) for about 10 minutes - sysctl debug.ktr.mask=0 - Waited 10-15 seconds - ktrdump -ctq > ktr.out - sysctl debug.ktr.mask=0x20000000 (what the value was originally) - scp'd ktr.out to the VM box - On VM in X: python /usr/src/tools/sched/schedgraph.py ktr.out 2.8 What I end up with is an application that is a bit difficult for me to grasp. It seems to indicate spanning a time frame of 256.47 seconds, and KTR had collected 247735 events from a total of 83 sources. I can, of course, scroll through all those sources (vertically) but I'm not really sure what it is that's being graphed on a per-event basis. What the graphs (individually) represent on a vertical scale I'm not sure. Horizontally they seem to be "sectionalised" in some manner, like into blocks (possibly of time?) but I can't tell if that's "how long something was running for" and if that actually correlates with CPU load or not. This is very hard to explain in text, quite honestly, and I can't find a single example of how to use this tool anywhere online. rwatson's site has some vague information (that also seems outdated, particularly shoving the ktrdump output through sort -n). I've put a screenshot up of the relevant window, specifically the CPU n load parts. Part of me wonders if the repeated "spikes" (especially on CPU 0?) are indicators of what I'm experiencing, but I really don't know: http://jdc.koitsu.org/freebsd/releng10_perf_issue/sched01.png I'm happy to provide the ktr.out if it's of any use, by the way. My KTR kernel configuration settings on the bare metal box are: options KTR options KTR_ENTRIES=262144 options KTR_COMPILE=(KTR_SCHED) options KTR_MASK=(KTR_SCHED) Which based on reading the python script vs. what's on the web seem to be what's generally desired. I suppose I can try doing things like shutting off all the daemons I normally run and then see if the problem goes away (and if so try to track it down to a single daemon), but like I said I don't really see anything userland-process-wise suddenly start taking up CPU time. Whatever it is is "heavy" enough to cause the load to go from 0.03 to 0.24 or so within a few seconds and then settle down again. Part of me wonders if it's ZFS (periodic txg flushing or something). Oh, one thing I did find manually: top -S -H -b 999999 on two different boxes: the "swapper" thread seems interesting and I'll explain why: RELENG_10 box: 1:49AM up 5:52, 1 user, load averages: 0.32, 0.16, 0.10 0 root -16 0 0K 4912K swapin 0 1:04 0.00% [kernel{swapper}] RELENG_9 box: 1:49AM up 39 days, 8:19, 1 user, load averages: 0.04, 0.06, 0.02 0 root -16 0 0K 160K swapin 0 0:55 0.00% [kernel{swapper}] I don't know what the "swapper" thread is for or doing (I assume not related to pagedaemon?), but I find it interesting that the RELENG_10 system has already used 1:04 worth of system time (I think that's 1 minute 4 seconds worth?) for a system that's only been up 5 hours and not using any swap, while a different RELENG_9 box that's been up for 39 days and doing a lot more I/O (receiving mail, logging things, running a public web server with logs, etc.) has only used 0:55 (and is actually using 104MBytes of swap). TL;DR -- I wonder if it's "swapper" that's doing it. In the schedgraph, swapper shows up in little chunks of 10 second durations, but that may be normal. :/ On Sun, Jul 20, 2014 at 12:13:05AM -0700, Adrian Chadd wrote: > Hi, > > I don't know how to do this with dtrace, but take a look at > tools/sched/schedgraph.py and enable KTR to get some trace records. > > KTR logs the scheduler activity -and- the loadav with it. > > > -a > > > On 19 July 2014 23:24, Jeremy Chadwick wrote: > > (Please keep me CC'd as I'm not subscribed to freebsd-stable@) > > > > Today I took the liberty of upgrading my main home server from > > 9.3-STABLE (r268785) to 10.0-STABLE (r268894). The upgrade consisted of > > doing a fresh install of 10.0-STABLE on a brand new unused SSD. Most > > everything went as planned, barring a couple ports-related anomalies, > > and I seemed fairly impressed by the fact that buildworld times had > > dropped to 27 minutes and buildkernel to 4 minutes with clang (something > > I'd been avoiding like the plague for a long while). Kudos. > > > > But after an hour or so, I noticed a consistent (i.e. reproducible) > > trend: the system load average tends to hang around 0.10 to 0.15 all the > > time. There are times where the load drops to 0.03 or 0.04 but then > > something kicks it back up to 0.15 or 0.20 and then it slowly levels out > > again (over the course of a few minutes) then repeats. > > > > Obviously this is normal behaviour for a system when something is going > > on periodically. So I figured it might have been a userland process > > behaving differently under 10.x than 9.x. I let top -a -S -s 1 run and > > paid very very close attention to it for several minutes. Nothing. It > > doesn't appear to be something userland -- it appears to be something > > kernel-level, but nothing in top -S shows up as taking up any CPU time > > other than "[idle]" so I have no idea what might be doing it. > > > > The box isn't doing anything like routing network traffic/NAT, it's pure > > IPv4 (IPv6 disabled in world and kernel, and my home network does > > basically no IPv6) and sits idle most of the time fetching mail. It > > does use ZFS, but not for /, swap, /var, /tmp, or /usr. > > > > vmstat -i doesn't particularly show anything awful. All the cpuX:timer > > entries tend to fluctuate in rate, usually 120-200 or so; I'd expect an > > interrupt storm to be showing something in the 1000+ range. > > > > The only thing I can think of is the fact that the SSD being used has no > > 4K quirk entry in the kernel (and its ATA IDENTIFY responds with 512 > > logical, 512 physical, even though we know it's 4K). The partitions are > > all 1MB-aligned regardless. > > > > This is all bare-metal, by the way -- no virtualisation involved. > > > > I do have DTrace enabled/built on this box but I have absolutely no clue > > how to go about profiling things. For example maybe output of this sort > > would be helpful (but I've no idea how to get it): > > > > http://lists.freebsd.org/pipermail/freebsd-stable/2014-July/079276.html > > > > I'm certain I didn't see this behaviour in 9.x so I'd be happy to try > > and track it down if I had a little bit of hand-holding. > > > > I've put all the things I can think of that might be relevant to "system > > config/tuning bits" up here: > > > > http://jdc.koitsu.org/freebsd/releng10_perf_issue/ > > > > I should note my kernel config is slightly inaccurate (I've removed some > > stuff from the file in attempt to rebuild, but building world prior to > > kernel failed due to r268896 breaking world, but anyone subscribed here > > has already seen the Jenkins job of that ;-) ). > > > > Thanks. > > > > -- > > | Jeremy Chadwick jdc@koitsu.org | > > | UNIX Systems Administrator http://jdc.koitsu.org/ | > > | Making life hard for others since 1977. PGP 4BD6C0CB | > > > > _______________________________________________ > > freebsd-stable@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB |