Date: Wed, 03 Sep 2014 16:05:49 -0400 From: John Baldwin <jhb@freebsd.org> To: freebsd-hackers@freebsd.org Cc: Ian Lepore <ian@freebsd.org>, Allan Jude <allanjude@freebsd.org> Subject: Re: stopped processes using cpu? Message-ID: <1567020.dfiLmFunn8@ralph.baldwin.cx> In-Reply-To: <201408201138.40228.jhb@freebsd.org> References: <CAA3ZYrAzpxpFNST5ZT-zHvk4Gg38w-yH1dTQj53Fp_rM-hohaA@mail.gmail.com> <1408540626.1150.1.camel@revolution.hippie.lan> <201408201138.40228.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, August 20, 2014 11:38:40 AM John Baldwin wrote: > On Wednesday, August 20, 2014 9:17:06 am Ian Lepore wrote: > > On Tue, 2014-08-19 at 18:45 -0700, Tim Kientzle wrote: > > > On Aug 19, 2014, at 12:28 PM, Allan Jude <allanjude@freebsd.org> = wrote: > > > > On 2014-08-19 15:21, Dieter BSD wrote: > > > >> 8.2 on amd64 > > > >> Top(1) with no arguments reports that some firefox processes a= re > > > >> using >=20 > cpu >=20 > > > >> dispite being stopped (via kill -stop pid) for at least severa= l > > > >> hours. > > > >> Adding -C doesn't change the numbers. Ps(1) reports the same.= > > > >> Interestingly, a firefox that isn't stopped is (correctly?) re= ported > > > >> as > > > >> using 0 cpu. The 100% idle should be correct, but who knows. > > > >>=20 > > > >> last pid: 51932; load averages: 0.07, 0.99, 1.42 up 14+19:02= :56 >=20 > 08:48:28 >=20 > > > >> 267 processes: 1 running, 138 sleeping, 128 stopped > > > >> CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 1= 00% > > > >> idle > > > >> Mem: 1665M Active, 653M Inact, 240M Wired, 95M Cache, 372M Buf= , 815M >=20 > Free >=20 > > > >> Swap: 8965M Total, 560K Used, 8965M Free > > > >>=20 > > > >> PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU= > > > >> COMMAND > > > >>=20 > > > >> 44188 a 9 44 0 303M 187M STOP 113:19 13.43= % >=20 > firefox-bin >=20 > > > >> 92986 b 11 44 0 164M 62848K STOP 0:18 5.03= % >=20 > firefox-bin >=20 > > > >> 16507 c 11 44 0 189M 88976K STOP 0:13 0.24= % >=20 > firefox-bin >=20 > > > >> 2265 root 1 44 0 248M 193M select 625:38 0.00%= Xorg > > > >> 51271 d 10 44 0 233M 128M ucond 12:12 0.00= % >=20 > firefox-bin >=20 > > > >> _______________________________________________ > > > >> freebsd-hackers@freebsd.org mailing list > > > >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > > > >> To unsubscribe, send any mail to "freebsd-hackers- >=20 > unsubscribe@freebsd.org" >=20 > > > > I wonder if jhb@'s new top code solves this. He adjusted the wa= y CPU > > > > usage is tracked to be more responsive, and not based on averag= es > > >=20 > > > I wonder if jhb@=E2=80=99s new top code fixes the whacky WCPU val= ues we=E2=80=99ve been >=20 > seeing on FreeBSD/ARM. (1713% CPU is a little hard to believe on a s= ingle- > core board ;-). >=20 > > > Tim > >=20 > > *Fixes* it? I've been under the impression those changes caused it= . I > > certainly never saw 1000%+ numbers in top until very recently. >=20 > Yes, if it's a recent change then mine are to blame. In both cases t= he > numbers are imprecise. The older code still in stable@ (as in the OP= ), > takes a long time to ramp up and down. So in this case the processes= are > stopped (no, there's no rootkit), but the scheduler takes a long time= to > factor that into its decayed %CPU computation. >=20 > In the "new" code, the problem is that fetching the kinfo_proc and th= e > current timestamp for that kinfo_proc is not atomic. I have thought > about "fixing" that by embedding a new timeval in kinfo_proc that is > stamped with the time the individual kinfo_proc is generated. This w= ould > (I believe) alleviate the noise in the new code as the delta in wallt= ime > at the "bottom" of the ratio would then correspond to the delta in ru= ntime > on the "top". >=20 > However, trying to store a timeval in kinfo_proc is quite tricky as a= ll the > available fields are things like ints and longs. I could perhaps spl= it it > up into two longs which is kind of fugly. Another option would be to= just > generate a single long that holds raw nanoseconds uptime and store th= at > (wrapping would be ok since I would only care about deltas). So I tried this and the results aren't a lot better. I think the probl= em now=20 is that rufetch() doesn't force an update of the target thread's stats = to "now" (the way getrusage() does for curthread). Because the idle threa= d runs constantly when idle, it is especially prone to this imprecision. I'm = not=20 sure of a good way to fix this. Having a per-thread timestamp that was= =20 updated each time the runtime was updated would help for a currently-ru= nning thread perhaps. Another option would be to use an IPI (ewww) to force currently running threads to update their runtime when the sysctl runs.= That seems a bit expensive though. (I might at least try it to see if it do= es resolve it to verify my understanding of the issue.) --=20 John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1567020.dfiLmFunn8>