From owner-freebsd-stable@FreeBSD.ORG Tue Sep 23 20:38:20 2008 Return-Path: Delivered-To: stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6DF93106567A for ; Tue, 23 Sep 2008 20:38:20 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 472588FC22 for ; Tue, 23 Sep 2008 20:38:20 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTP id BF20B46B2C; Tue, 23 Sep 2008 16:38:19 -0400 (EDT) Date: Tue, 23 Sep 2008 21:38:19 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Mikhail Teterin In-Reply-To: <48D92589.8000200@aldan.algebra.com> Message-ID: References: <48D92589.8000200@aldan.algebra.com> User-Agent: Alpine 1.10 (BSF 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: stable@FreeBSD.org Subject: Re: 7.0-stable: a hung process - scheduler bug? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Sep 2008 20:38:20 -0000 On Tue, 23 Sep 2008, Mikhail Teterin wrote: > 37126 33012 8371 2425 2425 1 mi - FreeBSD ELF64 dmake > > PID TID COMM TDNAME CPU PRI STATE WCHAN > 37126 100724 dmake - 1 193 sleep - > > There are no problems ktrace-ing 33012, but nothing comes from there, as > that process simply waits for its child. I guess, the child -- 37126 was > (v)forked to launch a compiler or some such and remains stuck in between > (v)fork and exec somewhere... (lots of details elided) Yes, there's a period during exec where attaching debuggers isn't allowed, so if something gets wedged or otherwise lost there, ktrace isn't much use. On the other hand, if it's stuck there, then there are no syscalls going on anyway. Could you try procstat -kk on the process, does that shed any light? Another alternative, if you have DDB compiled in, is to break to the debugger and do a stack trace, or to use gdb on /dev/mem if you have a kernel.symbols. This may help us understand more about what is going on. Robert N M Watson Computer Laboratory University of Cambridge