From owner-freebsd-bugs@FreeBSD.ORG Tue Jun 5 20:50:11 2007 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2653016A400 for ; Tue, 5 Jun 2007 20:50:11 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [69.147.83.40]) by mx1.freebsd.org (Postfix) with ESMTP id C9ED913C45D for ; Tue, 5 Jun 2007 20:50:10 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id l55KoARE076750 for ; Tue, 5 Jun 2007 20:50:10 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id l55KoABX076749; Tue, 5 Jun 2007 20:50:10 GMT (envelope-from gnats) Date: Tue, 5 Jun 2007 20:50:10 GMT Message-Id: <200706052050.l55KoABX076749@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: "Jeffrey D. Wheelhouse" Cc: Subject: Re: kern/104406: [ufs] Processes get stuck in "ufs" state under persistent CPU load X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: "Jeffrey D. Wheelhouse" List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jun 2007 20:50:11 -0000 The following reply was made to PR kern/104406; it has been noted by GNATS. From: "Jeffrey D. Wheelhouse" To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/104406: [ufs] Processes get stuck in "ufs" state under persistent CPU load Date: Tue, 05 Jun 2007 16:26:26 -0400 I believe we have also experienced this bug (or a very similar one) on our 8-core amd64 systems under 6.2-RELEASE-p4. In our case, "top" shows that the system is 100% CPU utilized, with the vast majority of it as "system" time. (Ordinarily the system In the last case, we ended up with about 200 Apache processes that looked like this in ps: UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 25000 27121 26860 1977 -4 5 146324 33732 ufs DN ?? 0:03.75 httpd 25000 27147 37257 1994 -4 5 153748 29280 ufs DN ?? 0:03.72 httpd 25000 27157 36912 1805 -4 5 150756 26592 ufs DN ?? 0:02.91 httpd 25000 27224 27030 1845 -4 5 137536 24804 ufs DN ?? 0:01.25 httpd 25000 27274 26794 1829 -4 5 148140 35416 ufs DN ?? 0:02.90 httpd Once a process gets "stuck" in WCHAN ufs, it's blocked indefinitely, as described here, or at least so slow as to be indistinguishable from stuck. (Typical wait channels for our httpds are accept or kqread, as one would expect.) Each process in this state counts against the load average, so we often see load averages north of 200 when this is occurring. (Typical load average is below 2.) Kill enough processes (or possibly enough to hit the "right" process) and everything picks up again right where it left off. I also have no idea how to debug this. Thanks, Jeff -- Jeff Wheelhouse jdw@wheelhouse.org