From owner-freebsd-stable@FreeBSD.ORG Tue Jan 13 11:45:47 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 21184106564A; Tue, 13 Jan 2009 11:45:47 +0000 (UTC) (envelope-from petefrench@ticketswitch.com) Received: from constantine.ticketswitch.com (constantine.ticketswitch.com [IPv6:2002:57e0:1d4e:1::3]) by mx1.freebsd.org (Postfix) with ESMTP id D75208FC18; Tue, 13 Jan 2009 11:45:46 +0000 (UTC) (envelope-from petefrench@ticketswitch.com) Received: from dilbert.rattatosk ([10.64.50.6] helo=dilbert.ticketswitch.com) by constantine.ticketswitch.com with esmtps (TLSv1:AES256-SHA:256) (Exim 4.69 (FreeBSD)) (envelope-from ) id 1LMhiD-000ONw-Cn; Tue, 13 Jan 2009 11:45:37 +0000 Received: from petefrench by dilbert.ticketswitch.com with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1LMhiD-0000Rp-B2; Tue, 13 Jan 2009 11:45:37 +0000 To: rwatson@FreeBSD.org In-Reply-To: Message-Id: From: Pete French Date: Tue, 13 Jan 2009 11:45:37 +0000 Cc: freebsd-stable@freebsd.org, drosih@rpi.edu, rblayzor.bulk@inoc.net Subject: Re: Big problems with 7.1 locking up :-( X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jan 2009 11:45:47 -0000 > Lock order reversals are warnings of potential deadlock due to a lock cycle, > but deadlocks may not actually result, either because it's a false positive > (some locking construct that is deadlock free but involves lock cycles), or > because a cycle didn't actually form. The message is suggestive, but if you > have significant system activity after the message, then it may be unrelated. Its hard to tell in this case as there are no timestamps, so I cant see if there is any activity after the lockup. > Features like WITNESS and INVARIANTS may change the timing of the kernel > making certain race conditions less likely; I'd run with them for a bit and > see if you can reproduce the hang with them present, as they will make > debugging the problem a lot easier, if it's possible. Uh, the above *was* me reproducing the hang with them present ;-)) It quite happily hangs with thoise things in the kernel - indeed the next hang was immediately after I rebooted the machine. But even with WITNESS and INVARIANTS and all the rest it does not drop to a debugger, it simply locks up. That machine is currently turned off, but still has 7.1 installed. What would you like me to try now ? I have a lockup I can reproduce pretty reliably now (just wait and it will always lock up). I also found that my other 7.1 box locks up fairly reliably when doing a buildworld. The only similarily between these two machines and the ones which dont lock up is that these are serving DNS. The others don't. Note that all the hardware is identical, as is the installed software and the configuration. I am at a total loss... -pete.