From owner-freebsd-stable@FreeBSD.ORG Sat May 22 09:18:57 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0CDEA16A4CE for ; Sat, 22 May 2004 09:18:57 -0700 (PDT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8256743D2F for ; Sat, 22 May 2004 09:18:56 -0700 (PDT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.11/8.12.11) with ESMTP id i4MGI89P062234; Sat, 22 May 2004 12:18:08 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)i4MGI7BJ062231; Sat, 22 May 2004 12:18:08 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Sat, 22 May 2004 12:18:07 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Rob In-Reply-To: <40AB4E3A.6030407@users.sourceforge.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-stable@freebsd.org Subject: Re: system command hangs (unkillable); ps shows 'sbwait' state? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 May 2004 16:18:57 -0000 On Wed, 19 May 2004, Rob wrote: > I'm using fairly recent FreeBSD-stable on intel PC. > > Within last few days, I encountered two hangs of a system command, that > I was unable to resolve. I could not kill the command, even a 'kill -9' > did not work. I don't remember seeing any commits to -STABLE that seem like likely candidates for a new change causing this problem. I also haven't seen any increased instability now that the twe driver problems have been fixed by the vendor. Have you made any local configuration or load changes? In particular, has the general system load on your system changes recently (more web traffic, more I/O) in a measurable way? > 1. I had a 2.5 Gb disk mounted on /home/software. > As root, I overloaded the filesystem, with negative percentage left > on the device (from df command). So as root, I did a 'rm -rf' in > /home/software, followed by a 'df -h'. But the df command gave no > response and became unkillable by any means (ctrl-C, kill -9 ). > Using 'ps', I found the df command in the 'sbwait' state. > > 2. I had a usb device mounted as /dev/da0s1 on /mnt. Mounting (as root) > went all well, but when I unmounted it, as root, the umount command > hanged, again the umount command was in sbwait state. > In this case it was even worse: when I killed the xterminal > where the umount command was hanging, the whole system froze. > Only power off/on helped me out here. > > I don't know what happened; don't know how to further investigate this. > Has somebody else similar experiences? Is stability going down for > Stable kernel? Could you take a look at the instructions in the Handbook on setting up for kernel debugging, compile the kernel with DDB, and generate stack traces for the hung processes + the output of "show lockedvnodes"? Also, if you can get a core dump, it might be interesting to see the output of netstat -mb on the core. Finally, are you using any features like NIS or NFS? Having umount stuck in sbwait sounds like a fairly unusual failure mode unless you're using NFS. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Senior Research Scientist, McAfee Research