Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Dec 2004 21:19:08 +1100
From:      Peter Jeremy <PeterJeremy@optushome.com.au>
To:        Benjamin Lutz <benlutz@datacomm.ch>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: slow system freeze - data
Message-ID:  <20041228101908.GD7189@cirb503493.alcatel.com.au>
In-Reply-To: <200412280327.03752.benlutz@datacomm.ch>
References:  <Pine.NEB.3.96L.1041223102130.89131C-100000@fledge.watson.org> <200412260814.53592.benlutz@datacomm.ch> <20041228013844.GC7189@cirb503493.alcatel.com.au> <200412280327.03752.benlutz@datacomm.ch>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 2004-Dec-28 03:27:00 +0100, Benjamin Lutz wrote:
>> The info you dumped shows that there's a filesystem deadlock on
>> ad4s1f.
>
>In case you haven't guessed, that'd be my /usr.

I hadn't really considered which filesystem it was.  Locked root vnodes
are bad news anywhere - basically, once the root vnode is locked, the
entire filesystem will quickly become inaccessible.  Only marginally
goos point is that you can still save data in other partitions.

>> Unfortunately, it's not clear (to me) where to go next.  Printing the
>> locked vnodes might help but that's not easy to do without gdb.
>
>You mean that's the point where I need serial console access? I hope to 
>have that running after the holidays.

Running a serial gdb session would be nice but the alternative is to
force a crashdump and debug it offline.  (In theory, you should be
able to issue "call doadump" or "panic" as ddb commands and then
use "gdb -k /usr/obj/usr/src/sys/KERNELNAME/kernel.debug /var/crash/vmcore.N"
but I don't recall the final state of all this functionality in 5.3).
The alternative is to use "x" or "print" to dump the relevant number of
bytes and then manually decode it using the struct vnode definition in
<sys/vnode.h> - and I'm not sure what useful information that would impart.

>> >The first app that froze as far as I could tell was xmms.
>>
>> Actually, the locks suggest that the problem started with pid 678 -
>> kdeinit. This is unlikely to be
>
>Well, xmms is just the first app where it became apparent :)
>PID 678 is really kded (at least it is at the moment

3.jpg shows pid 678 as kdeinit.  (I'm not really sure what purpose
kdeinit processes serve other than clogging up the process table but I
don't run KDE and am trying to convince my son to give up on it).

>Btw, is my assumption that this is a kernel problem, not a problem with 
>any of my applications, correct?

Apart from the rtptio/idprio inversion problems (which you've already
ruled out) it shouldn't be anything application related.  Kris's point
about disk problems is well taken - I hadn't mentioned it because
there would be syslog and console messages.

Unfortunately, at this point, I'm a bit stumped.

-- 
Peter Jeremy



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20041228101908.GD7189>