Date: Sun, 20 Jul 2014 20:09:49 +0100 From: "Steven Hartland" <killing@multiplay.co.uk> To: "Jeremy Chadwick" <jdc@koitsu.org> Cc: freebsd-stable@freebsd.org Subject: Re: Consistently "high" CPU load on 10.0-STABLE Message-ID: <ED826825202341E58B71A3F718B60562@multiplay.co.uk> References: <20140720062413.GA56318@icarus.home.lan> <97EA8E571E634DBBAA70F7AA7F0DE97C@multiplay.co.uk> <20140720173524.GA67065@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
Hmm does vmstat -s indicate that 10 is swapping much more than 9? I know there was some issues in early 10 which resulted in it swapping more but I was under the impression this was fixed by: http://svnweb.freebsd.org/base?view=revision&revision=265944 http://svnweb.freebsd.org/base?view=revision&revision=265945 May be there's still and issue there? Regards Steve ----- Original Message ----- From: "Jeremy Chadwick" <jdc@koitsu.org> To: "Steven Hartland" <killing@multiplay.co.uk> Cc: <freebsd-stable@freebsd.org> Sent: Sunday, July 20, 2014 6:35 PM Subject: Re: Consistently "high" CPU load on 10.0-STABLE > Yes and no, heh... :-) > > Using top -a -H -S -z -s 1 and watching very very closely, what I end up > seeing is that occasionally syncer reaches WCPU percentages of around > 0.50% (or maybe higher) -- but when that happens, the actual load average > **does not** suddenly increase. > > The load just seems to go from like 0.01 or 0.02 to 0.12 or 0.20 > sporadically with no real evidence of "why" in top. Possibly this is > because I'm using -s 1 (one second update intervals) and whatever > happens is so fast/quick that it lasts less than a second / top doesn't > catch it, but still impacts the load? > > Anyway, the "top" processes per TIME in the above command are here: > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 17 root 16 - 0K 16K syncer 2 1:31 0.49% syncer > 0 root -16 0 0K 4912K swapin 0 1:04 0.00% kernel{swapper} > 0 root -92 0 0K 4912K - 2 0:17 0.00% kernel{em0 que} > 1767 root 20 0 57312K 10104K select 1 0:15 0.00% smbd > 12 root -60 - 0K 528K WAIT 0 0:13 0.00% intr{swi4: clock} > 643 dhcpd 20 0 24524K 13044K select 1 0:07 0.00% dhcpd > 12 root -88 - 0K 528K WAIT 1 0:04 0.00% intr{irq259: ahci0:ch} > 14 root -16 - 0K 16K - 2 0:04 0.00% rand_harvestq > 58515 jdc 20 0 37028K 6196K select 2 0:03 0.00% sshd > 420 bind 20 0 70872K 36552K kqread 0 0:02 0.00% named{named} > 19 root 20 - 0K 16K sdflus 2 0:02 0.00% softdepflush > 420 bind 20 0 70872K 36552K uwait 3 0:02 0.00% named{named} > > This is with a system uptime of 14 hours. > > Comparatively, the RELENG_9 box I have (although it's on a VPS), which > does a lot more in general and has an uptime of 39 days, shows something > like this: > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 12 root -60 - 0K 224K WAIT 0 132:49 0.00% intr{swi4: clock} > 0 root -92 0 0K 160K - 0 89:01 0.00% kernel{em0 taskq} > 17 root 16 - 0K 16K syncer 0 45:46 0.00% syncer > 12 root -88 - 0K 224K WAIT 1 16:31 0.00% intr{irq14: ata0} > 12 root -60 - 0K 224K WAIT 1 12:52 0.00% intr{swi4: clock} > 13 root -8 - 0K 48K - 1 8:03 0.00% geom{g_down} > 15490 halbot 20 0 76172K 22532K select 0 6:34 0.00% perl > 593 bind 20 0 92288K 23740K uwait 1 4:39 0.00% named{named} > 593 bind 20 0 92288K 23740K uwait 0 4:36 0.00% named{named} > > So syncer looks like it might be about right for both systems, but > "swapper" (still not sure what that is exactly) sure seems to be much > more busy on the RELENG_10 box doing nothing vs. the RELENG_9 box. > Here's the swapper line from the RELENG_9 box: > > $ top -a -H -S -b 9999 | grep swapper > 0 root -16 0 0K 160K swapin 0 0:55 0.00% kernel{swapper} > > It's the only suspect I have at this point but it's not very good > evidence of anything. :/ > > Maybe I can use while : ; do top -a -S -H -z -b 99999 >> somelog.txt ; > sleep 0.25 ; done and let that run for a while in hopes of catching > something? But I also worry that such a test would actually impact the > load itself. > > On Sun, Jul 20, 2014 at 03:26:02PM +0100, Steven Hartland wrote: >> If you add -H -z to your top command does anything stand out? >> >> Regards >> Steve >> ----- Original Message ----- >> From: "Jeremy Chadwick" <jdc@koitsu.org> >> To: <freebsd-stable@freebsd.org> >> Sent: Sunday, July 20, 2014 7:24 AM >> Subject: Consistently "high" CPU load on 10.0-STABLE >> >> >> > (Please keep me CC'd as I'm not subscribed to freebsd-stable@) >> > >> > Today I took the liberty of upgrading my main home server from >> > 9.3-STABLE (r268785) to 10.0-STABLE (r268894). The upgrade consisted of >> > doing a fresh install of 10.0-STABLE on a brand new unused SSD. Most >> > everything went as planned, barring a couple ports-related anomalies, >> > and I seemed fairly impressed by the fact that buildworld times had >> > dropped to 27 minutes and buildkernel to 4 minutes with clang (something >> > I'd been avoiding like the plague for a long while). Kudos. >> > >> > But after an hour or so, I noticed a consistent (i.e. reproducible) >> > trend: the system load average tends to hang around 0.10 to 0.15 all the >> > time. There are times where the load drops to 0.03 or 0.04 but then >> > something kicks it back up to 0.15 or 0.20 and then it slowly levels out >> > again (over the course of a few minutes) then repeats. >> > >> > Obviously this is normal behaviour for a system when something is going >> > on periodically. So I figured it might have been a userland process >> > behaving differently under 10.x than 9.x. I let top -a -S -s 1 run and >> > paid very very close attention to it for several minutes. Nothing. It >> > doesn't appear to be something userland -- it appears to be something >> > kernel-level, but nothing in top -S shows up as taking up any CPU time >> > other than "[idle]" so I have no idea what might be doing it. >> > >> > The box isn't doing anything like routing network traffic/NAT, it's pure >> > IPv4 (IPv6 disabled in world and kernel, and my home network does >> > basically no IPv6) and sits idle most of the time fetching mail. It >> > does use ZFS, but not for /, swap, /var, /tmp, or /usr. >> > >> > vmstat -i doesn't particularly show anything awful. All the cpuX:timer >> > entries tend to fluctuate in rate, usually 120-200 or so; I'd expect an >> > interrupt storm to be showing something in the 1000+ range. >> > >> > The only thing I can think of is the fact that the SSD being used has no >> > 4K quirk entry in the kernel (and its ATA IDENTIFY responds with 512 >> > logical, 512 physical, even though we know it's 4K). The partitions are >> > all 1MB-aligned regardless. >> > >> > This is all bare-metal, by the way -- no virtualisation involved. >> > >> > I do have DTrace enabled/built on this box but I have absolutely no clue >> > how to go about profiling things. For example maybe output of this sort >> > would be helpful (but I've no idea how to get it): >> > >> > http://lists.freebsd.org/pipermail/freebsd-stable/2014-July/079276.html >> > >> > I'm certain I didn't see this behaviour in 9.x so I'd be happy to try >> > and track it down if I had a little bit of hand-holding. >> > >> > I've put all the things I can think of that might be relevant to "system >> > config/tuning bits" up here: >> > >> > http://jdc.koitsu.org/freebsd/releng10_perf_issue/ >> > >> > I should note my kernel config is slightly inaccurate (I've removed some >> > stuff from the file in attempt to rebuild, but building world prior to >> > kernel failed due to r268896 breaking world, but anyone subscribed here >> > has already seen the Jenkins job of that ;-) ). >> > >> > Thanks. >> > >> > -- >> > | Jeremy Chadwick jdc@koitsu.org | >> > | UNIX Systems Administrator http://jdc.koitsu.org/ | >> > | Making life hard for others since 1977. PGP 4BD6C0CB | >> > >> > _______________________________________________ >> > freebsd-stable@freebsd.org mailing list >> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable >> > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >> > > > -- > | Jeremy Chadwick jdc@koitsu.org | > | UNIX Systems Administrator http://jdc.koitsu.org/ | > | Making life hard for others since 1977. PGP 4BD6C0CB | > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ED826825202341E58B71A3F718B60562>