Date: Wed, 19 Jul 2006 09:26:41 -0600 From: Scott Long <scottl@samsco.org> To: User Freebsd <freebsd@hub.org> Cc: Kostik Belousov <kostikbel@gmail.com>, Achim_Leubner@adaptec.com, Robert Watson <rwatson@freebsd.org>, freebsd-stable@freebsd.org Subject: Re: file system deadlock - the whole story? Message-ID: <44BE4F31.9020606@samsco.org> In-Reply-To: <20060719115948.M1799@ganymede.hub.org> References: <E1FxzUU-000MMw-5m@cs1.cs.huji.ac.il> <20060705100403.Y80381@fledge.watson.org> <cone.1152136419.991036.72616.1000@zoraida.natserv.net> <20060705234514.I70011@fledge.watson.org> <20060715000351.U1799@ganymede.hub.org> <20060715035308.GJ32624@deviant.kiev.zoral.com.ua> <20060718074804.W1799@ganymede.hub.org> <20060719112424.GK1464@deviant.kiev.zoral.com.ua> <20060719082627.H1799@ganymede.hub.org> <20060719151327.H5132@fledge.watson.org> <20060719112208.Y1799@ganymede.hub.org> <20060719154447.K5132@fledge.watson.org> <20060719115948.M1799@ganymede.hub.org>
next in thread | previous in thread | raw e-mail | index | archive | help
User Freebsd wrote: > On Wed, 19 Jul 2006, Robert Watson wrote: > >> On Wed, 19 Jul 2006, User Freebsd wrote: >> >>>> Yes, this was going to be my next question -- if you're seeing >>>> wedges under load and there's a common controller in use, maybe >>>> we're looking at a driver bug. Bugs of those sort typically look a >>>> lot like what you describe: an I/O is "lost" and so eveything that >>>> depends on the I/O wedges waiting for it, leading to a lot of >>>> processes hanging around waiting for vnode locks, etc. >>> >>> >>> 'k, but how do we debug *that*? :( If it was one, I'd suspect >>> hardware ... but *three*, and only acting up *after* upgrading to >>> FreeBSD 6.x, and only acting up under load ... >> >> >> There are two normal approaches: >> >> (1) Switch controllers and see if the problem goes away, then blame the >> controller that was replaced. :-) >> >> (2) Debug the driver when the system is in the wedged state. When >> Scott Long >> helped me out with an identical problem with the 3ware driver a few >> years >> ago, he basically added debugging output for the driver in the >> debugger to >> list the state of outstanding I/Os, count the number of in-bound, >> out-bound I/Os, etc, to try and find where the missing one was >> leaked. My >> impression is that once he had confirmed the presence of the >> problem, it >> was fairly easy to fix, but that confirming it required quite a bit of >> paperwork. > > > 'k, first question is with the core file provide any insight into this? > ie. provide further confirmation that it looks like the driver vs file > system? > > second question, who is currently maintaining the iir driver? I've CC'd > Achim in this, as he's listed in the man page as being the maintainer ... > > Now, uranus has all the various kernel debugging enabled right now, and > a serial console, so we're good for the debugging side of things ... and > I believe that I can fairly easily "recreate" the issue by just moving a > whack of vServers onto that machine to give it the load that seems to > kill it ... *and* uranus is one of my newer machines, so the card that > is in it is fairly new ... but, since I have a full BIOS serial console > working on it, I should be able to get full model # and firmware > version, which I take it will help some? > What exact version of FreeBSD are you dealing with? Scott
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?44BE4F31.9020606>