Date: Thu, 23 Dec 2004 10:26:21 +0000 (GMT) From: Robert Watson <rwatson@freebsd.org> To: Benjamin Lutz <benlutz@datacomm.ch> Cc: freebsd-stable@freebsd.org Subject: Re: slow system freeze Message-ID: <Pine.NEB.3.96L.1041223102130.89131C-100000@fledge.watson.org> In-Reply-To: <200412230408.48770.benlutz@datacomm.ch>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 23 Dec 2004, Benjamin Lutz wrote: > I'm having a Problem with FreeBSD 5.3 here. The system slowly freezes. > > It starts with one application that just locks up. Other applications > still work, but when I switch to them and do stuff in them, they usually > lock up after a few seconds as well. Starting new processes or logging in > at a physical console does not work anymore, and after about 30 secs the > whole system is frozen. Nothing is printed to the first physical console > or the logs. This has happened both under load and while the system was > mostly idle (just me irc'ing). > > Now, I realize that this description is very vague, but maybe you can tell > me how to even start debugging this? There's no panic, ie no kernel dump > I could analyze. > > I'm no kernel developer, but if I had to guess it sounds like a scheduler > problem, ie some table being overwritten. > > I've attached my dmesg for reference. This is actually fairly symptomatic of a deadlock, either due to a leaked lock, a literal lock deadlock, or a resource deadlock. If you can get to the console, either by switching away from X or via a serial console, compile your kernel with DDB+KDB, break to the debugger, and do the following: ps show threads show lockedvnods You might also try building with INVARIANTS and WITNESS support, and see if the failure mode becomes an assertion failure instead of a wedge. With WITNESS compiled in, you can also get more extensive debugging information using "show locks" and "show witness". Ideally, with a serial console, you can copy and paste the results of these commands into an e-mail. If you don't have a serial console, it's a bit more laborious: however, what you're looking for is lots of threads blocked in similar wait channels in the ps output. You'll see lots of output like this: db> ps pid proc uid ppid pgrp flag stat wmesg wchan cmd 586 c168adc8 0 585 585 0000002 [SLPQ ttyin 0xc13e1c10][SLP] cu 585 c16ca000 0 559 585 0004002 [SLPQ ttyin 0xc13e5410][SLP] cu 559 c16867e0 0 558 559 0004002 [SLPQ pause 0xc1686814][SLP] csh 558 c16869d8 0 1 558 0004102 [SLPQ wait 0xc16869d8][SLP] login 557 c1686bd0 0 1 557 0004002 [SLPQ ttyin 0xc13ee810][SLP] getty 556 c1686dc8 0 1 556 0004002 [SLPQ ttyin 0xc13f4c10][SLP] getty ^^^^^^^^^^^^^^ this stuff What we want to know is what the common entries in the "wmesg" column are, particularly for processes that are known to be in a wedge state. If doing this by hand, we don't need the output of "show threads", but knowing how many lines and what sort of lines appear in "show lockedvnods" would be useful. You can find some reasonable documentation on how to get started on kernel debugging in the handbook. I'm not sure it addresses live debugging via DDB in great detail, so I guess I'll take a look and flesh it out some over the holidays if there isn't enough information there. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1041223102130.89131C-100000>