From owner-freebsd-stable@FreeBSD.ORG Wed Jul 19 15:27:25 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 68A6716A4DD; Wed, 19 Jul 2006 15:27:25 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1A8C843D8B; Wed, 19 Jul 2006 15:27:18 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [10.10.3.185] ([165.236.175.187]) (authenticated bits=0) by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id k6JFR3Up008427; Wed, 19 Jul 2006 09:27:09 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <44BE4F31.9020606@samsco.org> Date: Wed, 19 Jul 2006 09:26:41 -0600 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.12) Gecko/20060206 X-Accept-Language: en-us, en MIME-Version: 1.0 To: User Freebsd References: <20060705100403.Y80381@fledge.watson.org> <20060705234514.I70011@fledge.watson.org> <20060715000351.U1799@ganymede.hub.org> <20060715035308.GJ32624@deviant.kiev.zoral.com.ua> <20060718074804.W1799@ganymede.hub.org> <20060719112424.GK1464@deviant.kiev.zoral.com.ua> <20060719082627.H1799@ganymede.hub.org> <20060719151327.H5132@fledge.watson.org> <20060719112208.Y1799@ganymede.hub.org> <20060719154447.K5132@fledge.watson.org> <20060719115948.M1799@ganymede.hub.org> In-Reply-To: <20060719115948.M1799@ganymede.hub.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.0 required=3.8 tests=none autolearn=failed version=3.1.1 X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org Cc: Kostik Belousov , Achim_Leubner@adaptec.com, Robert Watson , freebsd-stable@freebsd.org Subject: Re: file system deadlock - the whole story? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Jul 2006 15:27:25 -0000 User Freebsd wrote: > On Wed, 19 Jul 2006, Robert Watson wrote: > >> On Wed, 19 Jul 2006, User Freebsd wrote: >> >>>> Yes, this was going to be my next question -- if you're seeing >>>> wedges under load and there's a common controller in use, maybe >>>> we're looking at a driver bug. Bugs of those sort typically look a >>>> lot like what you describe: an I/O is "lost" and so eveything that >>>> depends on the I/O wedges waiting for it, leading to a lot of >>>> processes hanging around waiting for vnode locks, etc. >>> >>> >>> 'k, but how do we debug *that*? :( If it was one, I'd suspect >>> hardware ... but *three*, and only acting up *after* upgrading to >>> FreeBSD 6.x, and only acting up under load ... >> >> >> There are two normal approaches: >> >> (1) Switch controllers and see if the problem goes away, then blame the >> controller that was replaced. :-) >> >> (2) Debug the driver when the system is in the wedged state. When >> Scott Long >> helped me out with an identical problem with the 3ware driver a few >> years >> ago, he basically added debugging output for the driver in the >> debugger to >> list the state of outstanding I/Os, count the number of in-bound, >> out-bound I/Os, etc, to try and find where the missing one was >> leaked. My >> impression is that once he had confirmed the presence of the >> problem, it >> was fairly easy to fix, but that confirming it required quite a bit of >> paperwork. > > > 'k, first question is with the core file provide any insight into this? > ie. provide further confirmation that it looks like the driver vs file > system? > > second question, who is currently maintaining the iir driver? I've CC'd > Achim in this, as he's listed in the man page as being the maintainer ... > > Now, uranus has all the various kernel debugging enabled right now, and > a serial console, so we're good for the debugging side of things ... and > I believe that I can fairly easily "recreate" the issue by just moving a > whack of vServers onto that machine to give it the load that seems to > kill it ... *and* uranus is one of my newer machines, so the card that > is in it is fairly new ... but, since I have a full BIOS serial console > working on it, I should be able to get full model # and firmware > version, which I take it will help some? > What exact version of FreeBSD are you dealing with? Scott