From owner-freebsd-stable@FreeBSD.ORG Thu Feb 16 13:58:54 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DA01216A420 for ; Thu, 16 Feb 2006 13:58:54 +0000 (GMT) (envelope-from gavin.atkinson@ury.york.ac.uk) Received: from mail-gw1.york.ac.uk (mail-gw1.york.ac.uk [144.32.128.246]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6C5B343D8B for ; Thu, 16 Feb 2006 13:58:53 +0000 (GMT) (envelope-from gavin.atkinson@ury.york.ac.uk) Received: from buffy.york.ac.uk (buffy-128.york.ac.uk [144.32.128.160]) by mail-gw1.york.ac.uk (8.12.10/8.12.10) with ESMTP id k1GDwE67019546; Thu, 16 Feb 2006 13:58:29 GMT Received: from buffy.york.ac.uk (localhost [127.0.0.1]) by buffy.york.ac.uk (8.13.4/8.13.4) with ESMTP id k1GDw8M6076961; Thu, 16 Feb 2006 13:58:08 GMT (envelope-from gavin.atkinson@ury.york.ac.uk) Received: (from ga9@localhost) by buffy.york.ac.uk (8.13.4/8.13.4/Submit) id k1GDw8IO076960; Thu, 16 Feb 2006 13:58:08 GMT (envelope-from gavin.atkinson@ury.york.ac.uk) X-Authentication-Warning: buffy.york.ac.uk: ga9 set sender to gavin.atkinson@ury.york.ac.uk using -f From: Gavin Atkinson To: Dan Nelson In-Reply-To: <20060215223432.GH70956@dan.emsphone.com> References: <1140027060.83368.11.camel@r4.agava-guns.domain> <20060215194204.GC70956@dan.emsphone.com> <20060215215608.GA55676@xor.obsecurity.org> <20060215223432.GH70956@dan.emsphone.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Thu, 16 Feb 2006 13:58:08 +0000 Message-Id: <1140098288.76342.44.camel@buffy.york.ac.uk> Mime-Version: 1.0 X-Mailer: Evolution 2.4.2.1 FreeBSD GNOME Team Port X-York-MailScanner: Found to be clean X-York-MailScanner-From: gavin.atkinson@ury.york.ac.uk Cc: freebsd-stable@freebsd.org Subject: Re: Strange process X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Feb 2006 13:58:55 -0000 On Wed, 2006-02-15 at 16:34 -0600, Dan Nelson wrote: > In the last episode (Feb 15), Kris Kennaway said: > > On Wed, Feb 15, 2006 at 01:42:04PM -0600, Dan Nelson wrote: > > > In the last episode (Feb 15), Ivan Kolosovskiy said: > > > > top: > > > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > > > > 38410 findfile 1 96 0 0K 0K START 0 0:00 0.00% grotty > > > > > > > > ps: > > > > host$ ps -waux | grep grotty > > > > findfile 38410 0,0 0,0 0 0 p6 REJ 19:57 0:00,25 [grotty] > > > > > > E in the STAT column means the process is trying to exit, but > > > can't. What does "ps lp 38410" print? The MWCHAN column should say > > > where in the kernel the process is stuck. > > > > I often see this too. For example: > > > > PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND > > 5357 kkenn 1 96 0 0K 0K START 0:00 0.35% xpdf > > > > > ps -waux | grep xpdf > > kkenn 5357 0.3 0.0 0 0 ?? RE Sun08PM 0:00.20 [xpdf] > > > > > ps lp 5357 > > UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND > > That syntax should have worked... Try a plain "px axl | grep xpdf" > instead. > > I think top's START state corresponds to the ~200-line window of code > in kern_fork.c:fork1() between p_state=PRS_NEW and p_state=PRS_NORMAL, > but I'm not positive. In my case (again on 6.0-REL), I have four such processes in top: 636 root 1 100 4 0K 0K START 0 0:00 5.08% bandwidthd 612 root 1 100 4 0K 0K START 0 0:00 4.14% bandwidthd 604 root 1 100 4 0K 0K START 0 0:00 3.39% bandwidthd 602 root 1 119 4 0K 0K START 1 0:00 0.00% bandwidthd and in ps -auxl | grep bandwidth : root 636 5.1 0.0 0 0 d1- RNE 26Jan06 0:00.39 [bandwidthd] 0 595 5 100 4 - root 612 4.1 0.0 0 0 d1- RNE 26Jan06 0:00.35 [bandwidthd] 0 594 4 100 4 - root 604 3.4 0.0 0 0 d1- RNE 26Jan06 0:00.29 [bandwidthd] 0 596 5 100 4 - root 602 0.0 0.0 0 0 d1- RNE 26Jan06 0:00.09 [bandwidthd] 0 597 316 119 4 - Note that in the top uutput, these processes have a non-zero WCPU percentage (which does not change) - I don't know if tis means the process did get to run briefly, or if they are frozen in time before that part of the process structure has been cleared out. This percentage does not count against the system idle percentagte in top: CPU states: 0.0% user, 0.0% nice, 0.2% system, 0.2% interrupt, 99.6% idle Hope that helps somebody figure out what is happening. Sadly I've not seen these on a machine with ddb in the kernel yet so I can't get a backtrace, if anyone else seeing these has ddb then that would probably be interesting to see. Gavin