From owner-freebsd-hackers@FreeBSD.ORG Wed Aug 20 16:00:44 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A1387725; Wed, 20 Aug 2014 16:00:44 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 72A303083; Wed, 20 Aug 2014 16:00:44 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 457E8B9CF; Wed, 20 Aug 2014 12:00:43 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org Subject: Re: stopped processes using cpu? Date: Wed, 20 Aug 2014 11:38:40 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; ) References: <10AEB4BC-B1B3-4312-A36C-ECE33EC56805@kientzle.com> <1408540626.1150.1.camel@revolution.hippie.lan> In-Reply-To: <1408540626.1150.1.camel@revolution.hippie.lan> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-7" Content-Transfer-Encoding: quoted-printable Message-Id: <201408201138.40228.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 20 Aug 2014 12:00:43 -0400 (EDT) Cc: Allan Jude , Ian Lepore X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Aug 2014 16:00:44 -0000 On Wednesday, August 20, 2014 9:17:06 am Ian Lepore wrote: > On Tue, 2014-08-19 at 18:45 -0700, Tim Kientzle wrote: > > On Aug 19, 2014, at 12:28 PM, Allan Jude wrote: > >=20 > > > On 2014-08-19 15:21, Dieter BSD wrote: > > >> 8.2 on amd64 > > >> Top(1) with no arguments reports that some firefox processes are usi= ng=20 cpu > > >> dispite being stopped (via kill -stop pid) for at least several hour= s. > > >> Adding -C doesn't change the numbers. Ps(1) reports the same. > > >> Interestingly, a firefox that isn't stopped is (correctly?) reported= as > > >> using 0 cpu. The 100% idle should be correct, but who knows. > > >>=20 > > >> last pid: 51932; load averages: 0.07, 0.99, 1.42 up 14+19:02:56 =20 08:48:28 > > >> 267 processes: 1 running, 138 sleeping, 128 stopped > > >> CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% id= le > > >> Mem: 1665M Active, 653M Inact, 240M Wired, 95M Cache, 372M Buf, 815M= =20 =46ree > > >> Swap: 8965M Total, 560K Used, 8965M Free > > >>=20 > > >> PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMA= ND > > >> 44188 a 9 44 0 303M 187M STOP 113:19 13.43%=20 firefox-bin > > >> 92986 b 11 44 0 164M 62848K STOP 0:18 5.03%=20 firefox-bin > > >> 16507 c 11 44 0 189M 88976K STOP 0:13 0.24%=20 firefox-bin > > >> 2265 root 1 44 0 248M 193M select 625:38 0.00% Xorg > > >> 51271 d 10 44 0 233M 128M ucond 12:12 0.00%=20 firefox-bin > > >> _______________________________________________ > > >> freebsd-hackers@freebsd.org mailing list > > >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > > >> To unsubscribe, send any mail to "freebsd-hackers- unsubscribe@freebsd.org" > > >>=20 > > >=20 > > > I wonder if jhb@'s new top code solves this. He adjusted the way CPU > > > usage is tracked to be more responsive, and not based on averages > >=20 > > I wonder if jhb@=A2s new top code fixes the whacky WCPU values we=A2ve = been=20 seeing on FreeBSD/ARM. (1713% CPU is a little hard to believe on a single- core board ;-). > >=20 > > Tim > >=20 >=20 > *Fixes* it? I've been under the impression those changes caused it. I > certainly never saw 1000%+ numbers in top until very recently. Yes, if it's a recent change then mine are to blame. In both cases the=20 numbers are imprecise. The older code still in stable@ (as in the OP), takes a long time to ramp up and down. So in this case the processes are stopped (no, there's no rootkit), but the scheduler takes a long time to factor that into its decayed %CPU computation. In the "new" code, the problem is that fetching the kinfo_proc and the current timestamp for that kinfo_proc is not atomic. I have thought about "fixing" that by embedding a new timeval in kinfo_proc that is stamped with the time the individual kinfo_proc is generated. This would (I believe) alleviate the noise in the new code as the delta in walltime at the "bottom" of the ratio would then correspond to the delta in runtime on the "top". However, trying to store a timeval in kinfo_proc is quite tricky as all the available fields are things like ints and longs. I could perhaps split it up into two longs which is kind of fugly. Another option would be to just= =20 generate a single long that holds raw nanoseconds uptime and store that (wrapping would be ok since I would only care about deltas). =2D-=20 John Baldwin