From owner-freebsd-current@FreeBSD.ORG Sun Jul 4 19:42:42 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 91F5016A4CF for ; Sun, 4 Jul 2004 19:42:42 +0000 (GMT) Received: from mailout1.informatik.tu-muenchen.de (mailout1.informatik.tu-muenchen.de [131.159.0.18]) by mx1.FreeBSD.org (Postfix) with ESMTP id C232B43D1F for ; Sun, 4 Jul 2004 19:42:41 +0000 (GMT) (envelope-from langd@informatik.tu-muenchen.de) Date: Sun, 4 Jul 2004 21:42:40 +0200 From: Daniel Lang To: current@freebsd.org Message-ID: <20040704194240.GA1658@atrbg11.informatik.tu-muenchen.de> References: <20040628202434.GA73213@atrbg11.informatik.tu-muenchen.de> <20040629170014.GC1144@green.homeunix.org> <20040629183557.GA77135@atrbg11.informatik.tu-muenchen.de> <200406291453.34291.jhb@FreeBSD.org> <20040701153221.GC84986@atrbg11.informatik.tu-muenchen.de> <20040701210317.GA86225@atrbg11.informatik.tu-muenchen.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040701210317.GA86225@atrbg11.informatik.tu-muenchen.de> X-Geek: GCS/CC d-- s: a- C++$ UBS++++$ P+++$ L- E-(---) W+++(--) N++ o K w--- O? M? V? PS+(++) PE--(+) Y+ PGP+ t++ 5+++ X R+(-) tv+ b+ DI++ D++ G++ e+++ h---(-) r+++ y+ User-Agent: Mutt/1.5.6i X-Virus-Scanned: by amavisd-new at informatik.tu-muenchen.de Subject: WITNESS bug (Was Re: kern/68442: panic - acquiring duplicate lock of same type: "sleepq chain") X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Jul 2004 19:42:42 -0000 Hi again Daniel Lang wrote on Thu, Jul 01, 2004 at 11:03:17PM +0200: [..] > > However, the panic is obviously triggered inside the witness > > code, because *lock_list = 0x0 in line 749. Although a few lines > > above, the list is checked for beeing empty (line 707), just > > Colin has already pointed out from the first trace I could > > get. But between line 707 and 749 there is no obvious modification > > to this list. I am not sure what 'find_instance()' does? > > So maybe another thread on another CPU has modified the locklist > > meanwhile? Is this possible? > [..] > > I just removed WITNESS from the kernel and see what happens. > If this is some strange corruption it may show somewhere > else if WITNESS is removed. Maybe this could be more obvious > then. If it doesn't crash any more, this could mean > WITNESS code itself is broken. > > Btw, the addition of WITNESS is indeed something that has > changed since all the trouble started. When the machine was > still running in a stable fashion I did not have WITNESS > enabled. I did it, when I put in more memory and built a PAE > kernel and left it in since. [..] I am now more convinced, that the bug is indeed in WITNESS itself. Without WITNESS the machine is running rock stable again now for three days. With WITNESS enabled it has crashed within minutes, uptime at most a few hours. Also the stack trace I got points into WITNESS code. For me, I can run without WITNESS, but I guess that's not how it is supposed to be.... Tomorrow I will reinsert the additional memory and re-enable PAE, but certainly not WITNESS. If it still runs stable, I will return it to full production. If anyone is still interested in locking into this bug, please let me know. I still have the crash-dump around. Cheers, Daniel -- IRCnet: Mr-Spock - My name is Pentium of Borg, division is futile, you will be approximated. - Daniel Lang * dl@leo.org * +49 89 289 18532 * http://www.leo.org/~dl/