From owner-freebsd-stable@FreeBSD.ORG Wed Jul 19 15:04:13 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1A92416A4DE; Wed, 19 Jul 2006 15:04:13 +0000 (UTC) (envelope-from freebsd@hub.org) Received: from hub.org (hub.org [200.46.204.220]) by mx1.FreeBSD.org (Postfix) with ESMTP id 481CA43D62; Wed, 19 Jul 2006 15:04:04 +0000 (GMT) (envelope-from freebsd@hub.org) Received: from localhost (mx1.hub.org [200.46.208.251]) by hub.org (Postfix) with ESMTP id 75ED7291B09; Wed, 19 Jul 2006 12:04:03 -0300 (ADT) Received: from hub.org ([200.46.204.220]) by localhost (mx1.hub.org [200.46.208.251]) (amavisd-new, port 10024) with ESMTP id 14919-06; Wed, 19 Jul 2006 12:04:03 -0300 (ADT) Received: from ganymede.hub.org (blk-224-179-167.eastlink.ca [24.224.179.167]) by hub.org (Postfix) with ESMTP id EB162290C31; Wed, 19 Jul 2006 12:04:02 -0300 (ADT) Received: by ganymede.hub.org (Postfix, from userid 1027) id 7F0B25D402; Wed, 19 Jul 2006 12:04:01 -0300 (ADT) Received: from localhost (localhost [127.0.0.1]) by ganymede.hub.org (Postfix) with ESMTP id 77B954AADF; Wed, 19 Jul 2006 12:04:01 -0300 (ADT) Date: Wed, 19 Jul 2006 12:04:01 -0300 (ADT) From: User Freebsd To: Robert Watson In-Reply-To: <20060719154447.K5132@fledge.watson.org> Message-ID: <20060719115948.M1799@ganymede.hub.org> References: <20060705100403.Y80381@fledge.watson.org> <20060705234514.I70011@fledge.watson.org> <20060715000351.U1799@ganymede.hub.org> <20060715035308.GJ32624@deviant.kiev.zoral.com.ua> <20060718074804.W1799@ganymede.hub.org> <20060719112424.GK1464@deviant.kiev.zoral.com.ua> <20060719082627.H1799@ganymede.hub.org> <20060719151327.H5132@fledge.watson.org> <20060719112208.Y1799@ganymede.hub.org> <20060719154447.K5132@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Kostik Belousov , Achim_Leubner@adaptec.com, freebsd-stable@freebsd.org Subject: Re: file system deadlock - the whole story? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Jul 2006 15:04:13 -0000 On Wed, 19 Jul 2006, Robert Watson wrote: > On Wed, 19 Jul 2006, User Freebsd wrote: > >>> Yes, this was going to be my next question -- if you're seeing wedges >>> under load and there's a common controller in use, maybe we're looking at >>> a driver bug. Bugs of those sort typically look a lot like what you >>> describe: an I/O is "lost" and so eveything that depends on the I/O wedges >>> waiting for it, leading to a lot of processes hanging around waiting for >>> vnode locks, etc. >> >> 'k, but how do we debug *that*? :( If it was one, I'd suspect hardware ... >> but *three*, and only acting up *after* upgrading to FreeBSD 6.x, and only >> acting up under load ... > > There are two normal approaches: > > (1) Switch controllers and see if the problem goes away, then blame the > controller that was replaced. :-) > > (2) Debug the driver when the system is in the wedged state. When Scott Long > helped me out with an identical problem with the 3ware driver a few years > ago, he basically added debugging output for the driver in the debugger > to > list the state of outstanding I/Os, count the number of in-bound, > out-bound I/Os, etc, to try and find where the missing one was leaked. > My > impression is that once he had confirmed the presence of the problem, it > was fairly easy to fix, but that confirming it required quite a bit of > paperwork. 'k, first question is with the core file provide any insight into this? ie. provide further confirmation that it looks like the driver vs file system? second question, who is currently maintaining the iir driver? I've CC'd Achim in this, as he's listed in the man page as being the maintainer ... Now, uranus has all the various kernel debugging enabled right now, and a serial console, so we're good for the debugging side of things ... and I believe that I can fairly easily "recreate" the issue by just moving a whack of vServers onto that machine to give it the load that seems to kill it ... *and* uranus is one of my newer machines, so the card that is in it is fairly new ... but, since I have a full BIOS serial console working on it, I should be able to get full model # and firmware version, which I take it will help some? ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664