From owner-freebsd-stable@FreeBSD.ORG Wed Apr 25 01:53:38 2007 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C090D16A403 for ; Wed, 25 Apr 2007 01:53:38 +0000 (UTC) (envelope-from janm@transactionware.com) Received: from mail.transactionware.com (mail.transactionware.com [203.14.245.7]) by mx1.freebsd.org (Postfix) with SMTP id 18CFC13C484 for ; Wed, 25 Apr 2007 01:53:37 +0000 (UTC) (envelope-from janm@transactionware.com) Received: (qmail 28138 invoked from network); 25 Apr 2007 01:53:57 -0000 Received: from midgard.transactionware.com (192.168.1.55) by dm.transactionware.com with SMTP; 25 Apr 2007 01:53:57 -0000 Received: (qmail 18218 invoked by uid 907); 25 Apr 2007 01:53:35 -0000 Received: from midgard.transactionware.com (HELO IBMA618C20271E) (192.168.1.55) by midgard.transactionware.com (qpsmtpd/0.32) with ESMTP; Wed, 25 Apr 2007 11:53:35 +1000 From: "Jan Mikkelsen" To: "'LI Xin'" , "'Kostik Belousov'" Date: Wed, 25 Apr 2007 11:53:32 +1000 Message-ID: <002b01c786dc$87b56e50$0502a8c0@IBMA618C20271E> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.6822 Thread-Index: AceGW0rk0enduMqTRNSKBiVZbxmdKwAgFZMg X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 Importance: Normal In-Reply-To: <462DDB4D.8080507@delphij.net> Cc: freebsd-stable@freebsd.org Subject: RE: 6.2-STABLE deadlock? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Apr 2007 01:53:38 -0000 LI Xin wrote: > Kostik Belousov wrote: > > On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote: > >> On Tue, Mar 13, 2007 at 02:08:48PM +0000, Adrian Wontroba wrote: > >>> At work, amoungst my stable of old computers running > FreeBSD, I have a > >>> Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This > >>> primarily runs Nagios and a small and lightly used MySQL > database, along > >>> with a few inbound FTP transfers per minute. It has a > Mylex card based > >>> disc subsystem, ruling out crash dumps. > >>> > >>> At some point during 5.5-STABLE this machine started to > occasionally hang ... > >> Another 6-STABLE (cvsupped on 27/03/07) example, with > diagnostics taken > >> rather sooner after the hang. Processes with wmesg=ufs > feature often in > >> the ps output. > >> > >> http://www.stade.co.uk/crash1/ > > > > I would suspect the mlx controller. There is several > processes (for instance, > > 988, 50918) waiting for completion of block read, and > processes in the "ufs" > > states are the result of the lock cascade, IMHO. > > I'm not very sure if this is specific to one disk controller. > Actually > I got some occasional reports about similar hangs on amd64 6.2-RELEASE > (slightly patched version) that most of processes stuck in the 'ufs' > state, under very light load, the box was equipped with amr(4) RAID. > > I was not able to reproduce the problem at my lab, though, it's still > unknown that how to trigger the livelock :-( Still need some > investigate on their production system. I have seen something similar once, on a machine with an Areca (arcmsr) controller, running 6.2-RELEASE (with unionfs patches). Processes stuck in "ufs", and the machine needed physical intervention to reboot. I haven't seen it since. From memory, it happened during startup of the applications and jails on the machine. Jan.