From owner-freebsd-hackers@FreeBSD.ORG  Wed Aug 20 16:00:44 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A1387725;
 Wed, 20 Aug 2014 16:00:44 +0000 (UTC)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 72A303083;
 Wed, 20 Aug 2014 16:00:44 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 457E8B9CF;
 Wed, 20 Aug 2014 12:00:43 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-hackers@freebsd.org
Subject: Re: stopped processes using cpu?
Date: Wed, 20 Aug 2014 11:38:40 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; )
References: <CAA3ZYrAzpxpFNST5ZT-zHvk4Gg38w-yH1dTQj53Fp_rM-hohaA@mail.gmail.com>
 <10AEB4BC-B1B3-4312-A36C-ECE33EC56805@kientzle.com>
 <1408540626.1150.1.camel@revolution.hippie.lan>
In-Reply-To: <1408540626.1150.1.camel@revolution.hippie.lan>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-7"
Content-Transfer-Encoding: quoted-printable
Message-Id: <201408201138.40228.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Wed, 20 Aug 2014 12:00:43 -0400 (EDT)
Cc: Allan Jude <allanjude@freebsd.org>, Ian Lepore <ian@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Aug 2014 16:00:44 -0000

On Wednesday, August 20, 2014 9:17:06 am Ian Lepore wrote:
> On Tue, 2014-08-19 at 18:45 -0700, Tim Kientzle wrote:
> > On Aug 19, 2014, at 12:28 PM, Allan Jude <allanjude@freebsd.org> wrote:
> >=20
> > > On 2014-08-19 15:21, Dieter BSD wrote:
> > >> 8.2 on amd64
> > >> Top(1) with no arguments reports that some firefox processes are usi=
ng=20
cpu
> > >> dispite being stopped (via kill -stop pid) for at least several hour=
s.
> > >> Adding -C doesn't change the numbers.  Ps(1) reports the same.
> > >> Interestingly, a firefox that isn't stopped is (correctly?) reported=
 as
> > >> using 0 cpu.  The 100% idle should be correct, but who knows.
> > >>=20
> > >> last pid: 51932;  load averages:  0.07, 0.99, 1.42 up 14+19:02:56 =20
08:48:28
> > >> 267 processes: 1 running, 138 sleeping, 128 stopped
> > >> CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% id=
le
> > >> Mem: 1665M Active, 653M Inact, 240M Wired, 95M Cache, 372M Buf, 815M=
=20
=46ree
> > >> Swap: 8965M Total, 560K Used, 8965M Free
> > >>=20
> > >>  PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMA=
ND
> > >> 44188 a           9  44    0   303M   187M STOP   113:19 13.43%=20
firefox-bin
> > >> 92986 b          11  44    0   164M 62848K STOP     0:18  5.03%=20
firefox-bin
> > >> 16507 c          11  44    0   189M 88976K STOP     0:13  0.24%=20
firefox-bin
> > >> 2265 root        1  44    0   248M   193M select 625:38  0.00% Xorg
> > >> 51271 d          10  44    0   233M   128M ucond   12:12  0.00%=20
firefox-bin
> > >> _______________________________________________
> > >> freebsd-hackers@freebsd.org mailing list
> > >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> > >> To unsubscribe, send any mail to "freebsd-hackers-
unsubscribe@freebsd.org"
> > >>=20
> > >=20
> > > I wonder if jhb@'s new top code solves this. He adjusted the way CPU
> > > usage is tracked to be more responsive, and not based on averages
> >=20
> > I wonder if jhb@=A2s new top code fixes the whacky WCPU values we=A2ve =
been=20
seeing on FreeBSD/ARM.  (1713% CPU is a little hard to believe on a single-
core board ;-).
> >=20
> > Tim
> >=20
>=20
> *Fixes* it?  I've been under the impression those changes caused it.  I
> certainly never saw 1000%+ numbers in top until very recently.

Yes, if it's a recent change then mine are to blame.  In both cases the=20
numbers are imprecise.  The older code still in stable@ (as in the OP),
takes a long time to ramp up and down.  So in this case the processes are
stopped (no, there's no rootkit), but the scheduler takes a long time to
factor that into its decayed %CPU computation.

In the "new" code, the problem is that fetching the kinfo_proc and the
current timestamp for that kinfo_proc is not atomic.  I have thought
about "fixing" that by embedding a new timeval in kinfo_proc that is
stamped with the time the individual kinfo_proc is generated.  This would
(I believe) alleviate the noise in the new code as the delta in walltime
at the "bottom" of the ratio would then correspond to the delta in runtime
on the "top".

However, trying to store a timeval in kinfo_proc is quite tricky as all the
available fields are things like ints and longs.  I could perhaps split it
up into two longs which is kind of fugly.  Another option would be to just=
=20
generate a single long that holds raw nanoseconds uptime and store that
(wrapping would be ok since I would only care about deltas).

=2D-=20
John Baldwin