From owner-freebsd-current@FreeBSD.ORG Fri Dec 12 02:10:52 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1C0A316A4CE for ; Fri, 12 Dec 2003 02:10:52 -0800 (PST) Received: from email01.aon.at (WARSL402PIP8.highway.telekom.at [195.3.96.97]) by mx1.FreeBSD.org (Postfix) with SMTP id A8B3C43D09 for ; Fri, 12 Dec 2003 02:10:48 -0800 (PST) (envelope-from shoesoft@gmx.net) Received: (qmail 36372 invoked from network); 12 Dec 2003 10:10:47 -0000 Received: from m118p012.dipool.highway.telekom.at (HELO ?62.46.4.172?) ([62.46.4.172]) (envelope-sender ) by qmail1rs.highway.telekom.at (qmail-ldap-1.03) with SMTP for ; 12 Dec 2003 10:10:47 -0000 From: Stefan Ehmann To: Don Lewis In-Reply-To: <200312110649.hBB6nDeF054514@gw.catspoiler.org> References: <200312110649.hBB6nDeF054514@gw.catspoiler.org> Content-Type: text/plain Message-Id: <1071223849.1494.21.camel@shoeserv.freebsd> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.5 Date: Fri, 12 Dec 2003 11:10:50 +0100 Content-Transfer-Encoding: 7bit cc: current@FreeBSD.org Subject: Re: kernel pointer polka, possibly by mount_nfs X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Dec 2003 10:10:52 -0000 On Thu, 2003-12-11 at 07:49, Don Lewis wrote: > On 10 Dec, Poul-Henning Kamp wrote: > > > > I have a 100% reproducible case here where it looks like mount_nfs > > tramples on the softc of a led(4) device. > > > > Stock -current kernel, HZ=1000, I've added a couple of sanity-checks > > in the timeout routine of led(4) and they trigger reliably on a > > byte which should not have been zero. > > > > In all cases so far, the currently running program is mount_nfs run > > from /etc/rc.mumble somewhere. > > > > The machine is a Soekris 4501 booting diskless. > > > > I have also seen a reproducible page fault panic in in_pcbremlist() > > if I put "set -x" as the second line in /etc/rc on the same machine, > > it smells the same to me. > > > > This problem likely affects 5.2-WHATEVER as well, and could be > > responsible for other Heisenbugs, and could be considered a > > showstopper. > > That sounds a somewhat like the Heisenbug I've been on the hunt for in > the last few weeks. This one liked to munch some file system's struct > mount, or whatever structure that mnt_data was pointing to. The system > in question typically blew up when attempting to lock mnt_lock in > vfs_busy(). The trigger appeared to be the use of read-only ext2fs. The > user who reported this problem said that the system would panic after a > few hours. After getting the user to sprinkle KASSERT()s around, I've > pretty come to the conclusion that the bug is not in the code for the > vfs top half. Another bit of data is that the struct mount getting > nuked doesn't appear to belong to ext2fs. It's hard to tell whose it is > though because it gets zeroed. > > I use NFS on my two -CURRENT boxes and haven't run into any problems, > and I also haven't been able to reproduce any panics with ext2fs, though > I haven't exercised that nearly as much. I guess you are talking about my panics. Since we don't seem to make any progress - would it help to find out when the change that causes the problem was made? I was running an end of september kernel for nearly two months without having panics 3 times a day. The kernel of Nov 23 had these problems. So the problem should be located somwhere in these two months. Since this may take quite some time (and a lot of kernel and worldbuilds), I'll only take it into account if there is a good chance that this will reveal the source of the problem.