Date: Tue, 24 Apr 2007 23:43:03 -0400 From: Kris Kennaway <kris@obsecurity.org> To: Jan Mikkelsen <janm@transactionware.com> Cc: 'Kostik Belousov' <kostikbel@gmail.com>, freebsd-stable@freebsd.org, 'LI Xin' <delphij@delphij.net> Subject: Re: 6.2-STABLE deadlock? Message-ID: <20070425034303.GA44054@xor.obsecurity.org> In-Reply-To: <002b01c786dc$87b56e50$0502a8c0@IBMA618C20271E> References: <462DDB4D.8080507@delphij.net> <002b01c786dc$87b56e50$0502a8c0@IBMA618C20271E>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Apr 25, 2007 at 11:53:32AM +1000, Jan Mikkelsen wrote: > LI Xin wrote: > > Kostik Belousov wrote: > > > On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote: > > >> On Tue, Mar 13, 2007 at 02:08:48PM +0000, Adrian Wontroba wrote: > > >>> At work, amoungst my stable of old computers running > > FreeBSD, I have a > > >>> Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This > > >>> primarily runs Nagios and a small and lightly used MySQL > > database, along > > >>> with a few inbound FTP transfers per minute. It has a > > Mylex card based > > >>> disc subsystem, ruling out crash dumps. > > >>> > > >>> At some point during 5.5-STABLE this machine started to > > occasionally hang ... > > >> Another 6-STABLE (cvsupped on 27/03/07) example, with > > diagnostics taken > > >> rather sooner after the hang. Processes with wmesg=ufs > > feature often in > > >> the ps output. > > >> > > >> http://www.stade.co.uk/crash1/ > > > > > > I would suspect the mlx controller. There is several > > processes (for instance, > > > 988, 50918) waiting for completion of block read, and > > processes in the "ufs" > > > states are the result of the lock cascade, IMHO. > > > > I'm not very sure if this is specific to one disk controller. > > Actually > > I got some occasional reports about similar hangs on amd64 6.2-RELEASE > > (slightly patched version) that most of processes stuck in the 'ufs' > > state, under very light load, the box was equipped with amr(4) RAID. > > > > I was not able to reproduce the problem at my lab, though, it's still > > unknown that how to trigger the livelock :-( Still need some > > investigate on their production system. > > I have seen something similar once, on a machine with an Areca (arcmsr) > controller, running 6.2-RELEASE (with unionfs patches). Processes stuck in > "ufs", and the machine needed physical intervention to reboot. I haven't > seen it since. From memory, it happened during startup of the applications > and jails on the machine. Sounds like one of the known unionfs bugs. Kris
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070425034303.GA44054>