Date: Thu, 16 Feb 2006 13:58:08 +0000 From: Gavin Atkinson <gavin.atkinson@ury.york.ac.uk> To: Dan Nelson <dnelson@allantgroup.com> Cc: freebsd-stable@freebsd.org Subject: Re: Strange process Message-ID: <1140098288.76342.44.camel@buffy.york.ac.uk> In-Reply-To: <20060215223432.GH70956@dan.emsphone.com> References: <1140027060.83368.11.camel@r4.agava-guns.domain> <20060215194204.GC70956@dan.emsphone.com> <20060215215608.GA55676@xor.obsecurity.org> <20060215223432.GH70956@dan.emsphone.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 2006-02-15 at 16:34 -0600, Dan Nelson wrote: > In the last episode (Feb 15), Kris Kennaway said: > > On Wed, Feb 15, 2006 at 01:42:04PM -0600, Dan Nelson wrote: > > > In the last episode (Feb 15), Ivan Kolosovskiy said: > > > > top: > > > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > > > > 38410 findfile 1 96 0 0K 0K START 0 0:00 0.00% grotty > > > > > > > > ps: > > > > host$ ps -waux | grep grotty > > > > findfile 38410 0,0 0,0 0 0 p6 REJ 19:57 0:00,25 [grotty] > > > > > > E in the STAT column means the process is trying to exit, but > > > can't. What does "ps lp 38410" print? The MWCHAN column should say > > > where in the kernel the process is stuck. > > > > I often see this too. For example: > > > > PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND > > 5357 kkenn 1 96 0 0K 0K START 0:00 0.35% xpdf > > > > > ps -waux | grep xpdf > > kkenn 5357 0.3 0.0 0 0 ?? RE Sun08PM 0:00.20 [xpdf] > > > > > ps lp 5357 > > UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND > > That syntax should have worked... Try a plain "px axl | grep xpdf" > instead. > > I think top's START state corresponds to the ~200-line window of code > in kern_fork.c:fork1() between p_state=PRS_NEW and p_state=PRS_NORMAL, > but I'm not positive. In my case (again on 6.0-REL), I have four such processes in top: 636 root 1 100 4 0K 0K START 0 0:00 5.08% bandwidthd 612 root 1 100 4 0K 0K START 0 0:00 4.14% bandwidthd 604 root 1 100 4 0K 0K START 0 0:00 3.39% bandwidthd 602 root 1 119 4 0K 0K START 1 0:00 0.00% bandwidthd and in ps -auxl | grep bandwidth : root 636 5.1 0.0 0 0 d1- RNE 26Jan06 0:00.39 [bandwidthd] 0 595 5 100 4 - root 612 4.1 0.0 0 0 d1- RNE 26Jan06 0:00.35 [bandwidthd] 0 594 4 100 4 - root 604 3.4 0.0 0 0 d1- RNE 26Jan06 0:00.29 [bandwidthd] 0 596 5 100 4 - root 602 0.0 0.0 0 0 d1- RNE 26Jan06 0:00.09 [bandwidthd] 0 597 316 119 4 - Note that in the top uutput, these processes have a non-zero WCPU percentage (which does not change) - I don't know if tis means the process did get to run briefly, or if they are frozen in time before that part of the process structure has been cleared out. This percentage does not count against the system idle percentagte in top: CPU states: 0.0% user, 0.0% nice, 0.2% system, 0.2% interrupt, 99.6% idle Hope that helps somebody figure out what is happening. Sadly I've not seen these on a machine with ddb in the kernel yet so I can't get a backtrace, if anyone else seeing these has ddb then that would probably be interesting to see. Gavin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1140098288.76342.44.camel>