Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 Sep 2006 18:08:51 +1000 (EST)
From:      Ian Smith <smithi@nimnet.asn.au>
To:        Giorgos Keramidas <keramida@FreeBSD.org>
Cc:        "Tamouh H." <hakmi@rogers.com>, questions@FreeBSD.org
Subject:   Re: Top not showing cpu usage even remotely accurately
Message-ID:  <Pine.BSF.3.96.1060914165639.8650A-100000@gaia.nimnet.asn.au>
In-Reply-To: <20060914054758.GA77575@gothmog.pc>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 14 Sep 2006, Giorgos Keramidas wrote:

 > On 2006-09-14 00:48, "Tamouh H." <hakmi@rogers.com> wrote:
 > > I think TOP and load averages are no longer accurate on FBSD 5.x and
 > > 6.x with SMP kernel. As far as I've seen. Load averages hit sometimes
 > > 8.0 without a noticable degradation in performance.

I still can't fathom what top tells me on a UP 5.5-STABLE system (300MHz
Celeron if speed's relevant).  I initiated this thread (weeks ago :) re
seeing 0.0% idle (as expected) during buildworld but not seeing anything
add up to anything like 100%, including S)ystem processes, in top. 

Chuck Swiger pointed out that a buildworld runs lots of processes for
far shorter times than top's sampling interval, which was true, as a
browse with 'lastcomm -eE | less' through the buildworld time showed.

However that doesn't explain this typical top view when the system is
quiescent or nearly so, as it mostly is, with only 5-minutely crons and
11-minutely entropy runs and the odd sendmail to be seen in lastcomm: 

last pid: 18500;  load averages:  0.01,  0.08,  0.06    up 5+08:40:33 17:30:30
136 processes: 3 running, 110 sleeping, 23 waiting
CPU states:  5.7% user,  0.0% nice,  6.3% system,  0.0% interrupt, 88.0% idle
Mem: 73M Active, 18M Inact, 46M Wired, 8108K Cache, 25M Buf, 2572K Free
Swap: 384M Total, 106M Used, 278M Free, 27% Inuse

  PID USERNAME PRI NICE   SIZE    RES STATE    TIME   WCPU    CPU COMMAND
   11 root     171   52     0K     8K RUN    102.3H 86.82% 86.82% idle
  743 smithi    96    0 26616K  2908K select 156:40  1.03%  1.03% kdeinit
  708 smithi    96    0 34140K 15024K select 223:05  0.63%  0.63% Xorg
  644 root      96    0  1244K   244K select  30:19  0.05%  0.05% moused
  775 smithi    20    0 11524K  1028K kserel 319:17  0.00%  0.00% xmms
  761 smithi    96    0 30824K  7272K select  97:50  0.00%  0.00% kdeinit
   27 root      76  -43     0K     8K RUN     44:14  0.00%  0.00% swi5: clock s
  772 smithi    96    0 29736K  5600K select  40:57  0.00%  0.00% kdeinit
  777 smithi     8    0  2300K   448K nanslp  36:20  0.00%  0.00% asapm
  778 smithi     8    0  2524K   460K nanslp  34:12  0.00%  0.00% ascpu
  767 smithi    96    0 29448K  5612K select  29:23  0.00%  0.00% kdeinit
  771 smithi    96    0 29884K  5504K select  22:28  0.00%  0.00% kdeinit
  616 mysql     20    0 50824K  1428K kserel  21:04  0.00%  0.00% mysqld
  759 smithi    96    0 29644K  5092K select  20:56  0.00%  0.00% kdeinit
  773 smithi    96    0 35640K  4080K select  20:39  0.00%  0.00% kdeinit
  766 smithi    96    0 29488K  4768K select  19:07  0.00%  0.00% kdeinit
  764 smithi    96    0 28784K  3964K select  16:38  0.00%  0.00% kdeinit
  774 smithi    96    0 33168K  3768K select  16:36  0.00%  0.00% kdeinit
  757 smithi    96    0 27272K  5508K select   4:55  0.00%  0.00% kdeinit
   23 root     -60 -179     0K     8K WAIT     3:04  0.00%  0.00% irq12: psm0
   22 root     -80 -199     0K     8K WAIT     3:02  0.00%  0.00% irq11: cbb0 c
   43 root      20    0     0K     8K syncer   3:00  0.00%  0.00% syncer
    4 root      -8    0     0K     8K -        2:58  0.00%  0.00% g_down
    3 root      -8    0     0K     8K -        2:30  0.00%  0.00% g_up
   49 root      12    0     0K     8K -        2:09  0.00%  0.00% schedcpu
   30 root     -16    0     0K     8K -        1:53  0.00%  0.00% yarrow
   39 root     -16    0     0K     8K psleep   1:30  0.00%  0.00% pagedaemon
   41 root     171   52     0K     8K pgzero   1:25  0.00%  0.00% pagezero
[..]

It never shows more than about 90% idle, whereas a 0.01 shorter term
load average should indicate more like 99% idle, shouldn't it?  97-99%,
sometimes 100% idle was what FreeBSD 4.5-R used to tell me with the same
workload in around the same memory use, but maybe 4.5 was optimistic .. 

 > > This is one TOP that freaked me out, notice Idle CPU is 70% while the
 > > process is showing it is using 99% of CPU. systat draws more accurate
 > > picture, however, load average is still useless as far as performance
 > > monitoring :
 > >
 > > last pid: 10174;  load averages:  1.63,  1.44,  1.20  up 4+00:25:19  00:39:20
 > > 169 processes: 2 running, 166 sleeping, 1 zombie
 > > CPU states: 25.8% user,  0.0% nice,  0.7% system,  0.1% interrupt, 73.4% idle
 > > Mem: 1316M Active, 1445M Inact, 297M Wired, 127M Cache, 112M Buf, 79M Free
 > > Swap: 8762M Total, 2096K Used, 8760M Free
 > >
 > >   PID USERNAME PRI NICE   SIZE    RES STATE  C   TIME   WCPU    CPU COMMAND
 > > 13362 root     111    0 36444K 34196K CPU3   3  50:06 98.88% 98.88% perl5.8.7
 > > 90391 root      96    0 27356K 26236K select 2   0:06  0.54%  0.54% perl5.8.7
 > > 79619 nobody     4    0   209M 84640K sbwait 1   0:09  0.39%  0.39% httpd
 > > 10161 root      97    0  6712K  4752K select 2   0:00  1.40%  0.20% exim-4.62-0
 > > 79649 nobody    20    0   210M 84464K lockf  0   0:06  0.15%  0.15% httpd
 > 
 > Apparently, you have a 4-CPU system :-)
 > 
 > What you see displayed as "CPU" is for one of the processors, not for
 > all of them.  Load average is not an easy thing to update for an SMP
 > system, I guess.  There are two options:

That idle looks right for one busy cpu of four, though what the other
0.63 load average consists of is less clear.  In my recent top shot
above, ordered by c)pu, I can't see more than 2 or 3% accounted for of
the ~15% that is not idle, ie what processes are involved with the 5.7%
user and 6.3% system usage? 

In FreeBSD 4, if (say) Mozilla went mad on some crappy javascript loop,
top would show idle at 0.0% and the busy process at or nearer 100%,
making it easy to spot and, if necessary, kill.  Since running 5.4-R and
now 5.5-STABLE, such 0.0% idle events can happen with top not showing
the process involved looking busy at all - I'll capture this next time -
and while it's usually obvious that (usually) Mozilla' the 'culprit' and
killing it frees the system, I'm still bemused that top can't 'see' it.
 
Re the 4-cpu box:

 > I don't remember off-hand how 5.X or 6.X calculate their load-average,
 > but I'd be interested to know what you expected it to show, or what it
 > shows on Linux systems.

I've only a few years watching 4.5-R on this laptop for comparison :) 
but am installing 6.1 on a newer machine any day now, and will report.

Cheers, Ian




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.96.1060914165639.8650A-100000>