From owner-freebsd-stable@FreeBSD.ORG Wed Jul 19 14:46:54 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8834D16A4EB for ; Wed, 19 Jul 2006 14:46:54 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1207243D8C for ; Wed, 19 Jul 2006 14:46:50 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id EED4546C7E; Wed, 19 Jul 2006 10:46:46 -0400 (EDT) Date: Wed, 19 Jul 2006 15:46:46 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: User Freebsd In-Reply-To: <20060719112208.Y1799@ganymede.hub.org> Message-ID: <20060719154447.K5132@fledge.watson.org> References: <20060705100403.Y80381@fledge.watson.org> <20060705234514.I70011@fledge.watson.org> <20060715000351.U1799@ganymede.hub.org> <20060715035308.GJ32624@deviant.kiev.zoral.com.ua> <20060718074804.W1799@ganymede.hub.org> <20060719112424.GK1464@deviant.kiev.zoral.com.ua> <20060719082627.H1799@ganymede.hub.org> <20060719151327.H5132@fledge.watson.org> <20060719112208.Y1799@ganymede.hub.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Kostik Belousov , freebsd-stable@freebsd.org Subject: Re: file system deadlock - the whole story? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Jul 2006 14:46:54 -0000 On Wed, 19 Jul 2006, User Freebsd wrote: >> Yes, this was going to be my next question -- if you're seeing wedges under >> load and there's a common controller in use, maybe we're looking at a >> driver bug. Bugs of those sort typically look a lot like what you >> describe: an I/O is "lost" and so eveything that depends on the I/O wedges >> waiting for it, leading to a lot of processes hanging around waiting for >> vnode locks, etc. > > 'k, but how do we debug *that*? :( If it was one, I'd suspect hardware ... > but *three*, and only acting up *after* upgrading to FreeBSD 6.x, and only > acting up under load ... There are two normal approaches: (1) Switch controllers and see if the problem goes away, then blame the controller that was replaced. :-) (2) Debug the driver when the system is in the wedged state. When Scott Long helped me out with an identical problem with the 3ware driver a few years ago, he basically added debugging output for the driver in the debugger to list the state of outstanding I/Os, count the number of in-bound, out-bound I/Os, etc, to try and find where the missing one was leaked. My impression is that once he had confirmed the presence of the problem, it was fairly easy to fix, but that confirming it required quite a bit of paperwork. Robert N M Watson Computer Laboratory University of Cambridge