From owner-freebsd-stable@FreeBSD.ORG Tue Dec 28 10:19:31 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3F08816A4CE for ; Tue, 28 Dec 2004 10:19:31 +0000 (GMT) Received: from mail11.syd.optusnet.com.au (mail11.syd.optusnet.com.au [211.29.132.192]) by mx1.FreeBSD.org (Postfix) with ESMTP id D7F6943D39 for ; Tue, 28 Dec 2004 10:19:29 +0000 (GMT) (envelope-from PeterJeremy@optushome.com.au) Received: from cirb503493.alcatel.com.au (c211-30-75-229.belrs2.nsw.optusnet.com.au [211.30.75.229]) iBSAJ99B032078 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Tue, 28 Dec 2004 21:19:10 +1100 Received: from cirb503493.alcatel.com.au (localhost.alcatel.com.au [127.0.0.1])iBSAJ9xP016391; Tue, 28 Dec 2004 21:19:09 +1100 (EST) (envelope-from pjeremy@cirb503493.alcatel.com.au) Received: (from pjeremy@localhost)iBSAJ8qL016390; Tue, 28 Dec 2004 21:19:08 +1100 (EST) (envelope-from pjeremy) Date: Tue, 28 Dec 2004 21:19:08 +1100 From: Peter Jeremy To: Benjamin Lutz Message-ID: <20041228101908.GD7189@cirb503493.alcatel.com.au> References: <200412260814.53592.benlutz@datacomm.ch> <20041228013844.GC7189@cirb503493.alcatel.com.au> <200412280327.03752.benlutz@datacomm.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200412280327.03752.benlutz@datacomm.ch> User-Agent: Mutt/1.4.2i cc: freebsd-stable@freebsd.org Subject: Re: slow system freeze - data X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Dec 2004 10:19:31 -0000 On Tue, 2004-Dec-28 03:27:00 +0100, Benjamin Lutz wrote: >> The info you dumped shows that there's a filesystem deadlock on >> ad4s1f. > >In case you haven't guessed, that'd be my /usr. I hadn't really considered which filesystem it was. Locked root vnodes are bad news anywhere - basically, once the root vnode is locked, the entire filesystem will quickly become inaccessible. Only marginally goos point is that you can still save data in other partitions. >> Unfortunately, it's not clear (to me) where to go next. Printing the >> locked vnodes might help but that's not easy to do without gdb. > >You mean that's the point where I need serial console access? I hope to >have that running after the holidays. Running a serial gdb session would be nice but the alternative is to force a crashdump and debug it offline. (In theory, you should be able to issue "call doadump" or "panic" as ddb commands and then use "gdb -k /usr/obj/usr/src/sys/KERNELNAME/kernel.debug /var/crash/vmcore.N" but I don't recall the final state of all this functionality in 5.3). The alternative is to use "x" or "print" to dump the relevant number of bytes and then manually decode it using the struct vnode definition in - and I'm not sure what useful information that would impart. >> >The first app that froze as far as I could tell was xmms. >> >> Actually, the locks suggest that the problem started with pid 678 - >> kdeinit. This is unlikely to be > >Well, xmms is just the first app where it became apparent :) >PID 678 is really kded (at least it is at the moment 3.jpg shows pid 678 as kdeinit. (I'm not really sure what purpose kdeinit processes serve other than clogging up the process table but I don't run KDE and am trying to convince my son to give up on it). >Btw, is my assumption that this is a kernel problem, not a problem with >any of my applications, correct? Apart from the rtptio/idprio inversion problems (which you've already ruled out) it shouldn't be anything application related. Kris's point about disk problems is well taken - I hadn't mentioned it because there would be syslog and console messages. Unfortunately, at this point, I'm a bit stumped. -- Peter Jeremy