Date: Wed, 03 Sep 2014 16:05:49 -0400 From: John Baldwin <jhb@freebsd.org> To: freebsd-hackers@freebsd.org Cc: Ian Lepore <ian@freebsd.org>, Allan Jude <allanjude@freebsd.org> Subject: Re: stopped processes using cpu? Message-ID: <1567020.dfiLmFunn8@ralph.baldwin.cx> In-Reply-To: <201408201138.40228.jhb@freebsd.org> References: <CAA3ZYrAzpxpFNST5ZT-zHvk4Gg38w-yH1dTQj53Fp_rM-hohaA@mail.gmail.com> <1408540626.1150.1.camel@revolution.hippie.lan> <201408201138.40228.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, August 20, 2014 11:38:40 AM John Baldwin wrote: > On Wednesday, August 20, 2014 9:17:06 am Ian Lepore wrote: > > On Tue, 2014-08-19 at 18:45 -0700, Tim Kientzle wrote: > > > On Aug 19, 2014, at 12:28 PM, Allan Jude <allanjude@freebsd.org> wrote: > > > > On 2014-08-19 15:21, Dieter BSD wrote: > > > >> 8.2 on amd64 > > > >> Top(1) with no arguments reports that some firefox processes are > > > >> using > > cpu > > > > >> dispite being stopped (via kill -stop pid) for at least several > > > >> hours. > > > >> Adding -C doesn't change the numbers. Ps(1) reports the same. > > > >> Interestingly, a firefox that isn't stopped is (correctly?) reported > > > >> as > > > >> using 0 cpu. The 100% idle should be correct, but who knows. > > > >> > > > >> last pid: 51932; load averages: 0.07, 0.99, 1.42 up 14+19:02:56 > > 08:48:28 > > > > >> 267 processes: 1 running, 138 sleeping, 128 stopped > > > >> CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% > > > >> idle > > > >> Mem: 1665M Active, 653M Inact, 240M Wired, 95M Cache, 372M Buf, 815M > > Free > > > > >> Swap: 8965M Total, 560K Used, 8965M Free > > > >> > > > >> PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU > > > >> COMMAND > > > >> > > > >> 44188 a 9 44 0 303M 187M STOP 113:19 13.43% > > firefox-bin > > > > >> 92986 b 11 44 0 164M 62848K STOP 0:18 5.03% > > firefox-bin > > > > >> 16507 c 11 44 0 189M 88976K STOP 0:13 0.24% > > firefox-bin > > > > >> 2265 root 1 44 0 248M 193M select 625:38 0.00% Xorg > > > >> 51271 d 10 44 0 233M 128M ucond 12:12 0.00% > > firefox-bin > > > > >> _______________________________________________ > > > >> freebsd-hackers@freebsd.org mailing list > > > >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > > > >> To unsubscribe, send any mail to "freebsd-hackers- > > unsubscribe@freebsd.org" > > > > > I wonder if jhb@'s new top code solves this. He adjusted the way CPU > > > > usage is tracked to be more responsive, and not based on averages > > > > > > I wonder if jhb@’s new top code fixes the whacky WCPU values we’ve been > > seeing on FreeBSD/ARM. (1713% CPU is a little hard to believe on a single- > core board ;-). > > > > Tim > > > > *Fixes* it? I've been under the impression those changes caused it. I > > certainly never saw 1000%+ numbers in top until very recently. > > Yes, if it's a recent change then mine are to blame. In both cases the > numbers are imprecise. The older code still in stable@ (as in the OP), > takes a long time to ramp up and down. So in this case the processes are > stopped (no, there's no rootkit), but the scheduler takes a long time to > factor that into its decayed %CPU computation. > > In the "new" code, the problem is that fetching the kinfo_proc and the > current timestamp for that kinfo_proc is not atomic. I have thought > about "fixing" that by embedding a new timeval in kinfo_proc that is > stamped with the time the individual kinfo_proc is generated. This would > (I believe) alleviate the noise in the new code as the delta in walltime > at the "bottom" of the ratio would then correspond to the delta in runtime > on the "top". > > However, trying to store a timeval in kinfo_proc is quite tricky as all the > available fields are things like ints and longs. I could perhaps split it > up into two longs which is kind of fugly. Another option would be to just > generate a single long that holds raw nanoseconds uptime and store that > (wrapping would be ok since I would only care about deltas). So I tried this and the results aren't a lot better. I think the problem now is that rufetch() doesn't force an update of the target thread's stats to "now" (the way getrusage() does for curthread). Because the idle thread runs constantly when idle, it is especially prone to this imprecision. I'm not sure of a good way to fix this. Having a per-thread timestamp that was updated each time the runtime was updated would help for a currently-running thread perhaps. Another option would be to use an IPI (ewww) to force currently running threads to update their runtime when the sysctl runs. That seems a bit expensive though. (I might at least try it to see if it does resolve it to verify my understanding of the issue.) -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1567020.dfiLmFunn8>
