Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Feb 2006 13:58:08 +0000
From:      Gavin Atkinson <gavin.atkinson@ury.york.ac.uk>
To:        Dan Nelson <dnelson@allantgroup.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Strange process
Message-ID:  <1140098288.76342.44.camel@buffy.york.ac.uk>
In-Reply-To: <20060215223432.GH70956@dan.emsphone.com>
References:  <1140027060.83368.11.camel@r4.agava-guns.domain> <20060215194204.GC70956@dan.emsphone.com> <20060215215608.GA55676@xor.obsecurity.org> <20060215223432.GH70956@dan.emsphone.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 2006-02-15 at 16:34 -0600, Dan Nelson wrote:
> In the last episode (Feb 15), Kris Kennaway said:
> > On Wed, Feb 15, 2006 at 01:42:04PM -0600, Dan Nelson wrote:
> > > In the last episode (Feb 15), Ivan Kolosovskiy said:
> > > > top:
> > > > PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
> > > > 38410 findfile    1  96    0     0K     0K START  0   0:00  0.00% grotty
> > > > 
> > > > ps:
> > > > host$ ps -waux | grep grotty
> > > > findfile 38410  0,0  0,0     0     0  p6  REJ  19:57     0:00,25 [grotty]
> > > 
> > > E in the STAT column means the process is trying to exit, but
> > > can't. What does "ps lp 38410" print?  The MWCHAN column should say
> > > where in the kernel the process is stuck.
> > 
> > I often see this too.  For example:
> > 
> >   PID USERNAME    THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
> >  5357 kkenn         1  96    0     0K     0K START    0:00  0.35% xpdf
> > 
> > > ps -waux  | grep xpdf
> > kkenn    5357  0.3  0.0     0     0  ??  RE   Sun08PM   0:00.20 [xpdf]
> > 
> > > ps lp 5357
> >   UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME COMMAND
> 
> That syntax should have worked...  Try a plain "px axl | grep xpdf"
> instead.
> 
> I think top's START state corresponds to the ~200-line window of code
> in kern_fork.c:fork1() between p_state=PRS_NEW and p_state=PRS_NORMAL,
> but I'm not positive.

In my case (again on 6.0-REL), I have four such processes in top:

  636 root        1 100    4     0K     0K START  0   0:00  5.08% bandwidthd
  612 root        1 100    4     0K     0K START  0   0:00  4.14% bandwidthd
  604 root        1 100    4     0K     0K START  0   0:00  3.39% bandwidthd
  602 root        1 119    4     0K     0K START  1   0:00  0.00% bandwidthd

and in ps -auxl | grep bandwidth :

root       636  5.1  0.0     0     0  d1- RNE  26Jan06   0:00.39 [bandwidthd]         0   595   5 100  4 -
root       612  4.1  0.0     0     0  d1- RNE  26Jan06   0:00.35 [bandwidthd]         0   594   4 100  4 -
root       604  3.4  0.0     0     0  d1- RNE  26Jan06   0:00.29 [bandwidthd]         0   596   5 100  4 -
root       602  0.0  0.0     0     0  d1- RNE  26Jan06   0:00.09 [bandwidthd]         0   597 316 119  4 -

Note that in the top uutput, these processes have a non-zero WCPU
percentage (which does not change) - I don't know if tis means the
process did get to run briefly, or if they are frozen in time before
that part of the process structure has been cleared out.  This
percentage does not count against the system idle percentagte in top:

CPU states:  0.0% user,  0.0% nice,  0.2% system,  0.2% interrupt, 99.6% idle

Hope that helps somebody figure out what is happening.  Sadly I've not
seen these on a machine with ddb in the kernel yet so I can't get a
backtrace, if anyone else seeing these has ddb then that would probably
be interesting to see.

Gavin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1140098288.76342.44.camel>