From owner-freebsd-stable@FreeBSD.ORG Tue Jan 13 00:09:24 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 874CB1065672 for ; Tue, 13 Jan 2009 00:09:24 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 635E58FC0C for ; Tue, 13 Jan 2009 00:09:24 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 1D97846B03; Mon, 12 Jan 2009 19:09:24 -0500 (EST) Date: Tue, 13 Jan 2009 00:09:24 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Pete French In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-stable@freebsd.org, drosih@rpi.edu, rblayzor.bulk@inoc.net Subject: Re: Big problems with 7.1 locking up :-( X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jan 2009 00:09:24 -0000 On Mon, 12 Jan 2009, Pete French wrote: >> I'm not sure if you've done this already, but the normal suggestions apply: >> have you compiled with INVARIANTS/WITNESS/DDB/KDB/BREAK_TO_DEBUGGER, and do >> any results / panics / etc result? Sometimes these debugging tools are >> able to convert hangs into panics, which gives us much more ability to >> debug them. > > OK, I have now had a machine hand again, with the correct debug options in > the kernel. The screen looked like this when I went to restart it: > > http://toybox.twisted.org.uk/~pete/71_lor2.png > > It had not, however, dropped into any kind of debugger. Also there appear to > me console messages after the lock order reversal - is that normal ? Lock order reversals are warnings of potential deadlock due to a lock cycle, but deadlocks may not actually result, either because it's a false positive (some locking construct that is deadlock free but involves lock cycles), or because a cycle didn't actually form. The message is suggestive, but if you have significant system activity after the message, then it may be unrelated. > The machine did stay up for a signifanct amount of time before doing this. I > notice that it is more or less identical to the one I posted whenI had > WITNESS_KDB in the kernel too, so maybe those results arent entirely > suprious after all ? > > Given it hasnt dropped to a debugger, is there anything else I can try ? Features like WITNESS and INVARIANTS may change the timing of the kernel making certain race conditions less likely; I'd run with them for a bit and see if you can reproduce the hang with them present, as they will make debugging the problem a lot easier, if it's possible. Robert N M Watson Computer Laboratory University of Cambridge