Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Dec 2003 10:32:03 -0800 (PST)
From:      Don Lewis <truckman@FreeBSD.org>
To:        shoesoft@gmx.net
Cc:        current@FreeBSD.org
Subject:   Re: kernel pointer polka, possibly by mount_nfs
Message-ID:  <200312121832.hBCIW3eF058641@gw.catspoiler.org>
In-Reply-To: <1071223849.1494.21.camel@shoeserv.freebsd>

index | next in thread | previous in thread | raw e-mail

On 12 Dec, Stefan Ehmann wrote:
> On Thu, 2003-12-11 at 07:49, Don Lewis wrote:

>> 
>> That sounds a somewhat like the Heisenbug I've been on the hunt for in
>> the last few weeks.  This one liked to munch some file system's struct
>> mount, or whatever structure that mnt_data was pointing to.  The system
>> in question typically blew up when attempting to lock mnt_lock in
>> vfs_busy().  The trigger appeared to be the use of read-only ext2fs. The
>> user who reported this problem said that the system would panic after a
>> few hours.  After getting the user to sprinkle KASSERT()s around, I've
>> pretty come to the conclusion that the bug is not in the code for the
>> vfs top half.  Another bit of data is that the struct mount getting
>> nuked doesn't appear to belong to ext2fs.  It's hard to tell whose it is
>> though because it gets zeroed.
>> 
>> I use NFS on my two -CURRENT boxes and haven't run into any problems,
>> and I also haven't been able to reproduce any panics with ext2fs, though
>> I haven't exercised that nearly as much.
> 
> I guess you are talking about my panics. Since we don't seem to make any
> progress - would it help to find out when the change that causes the
> problem was made?
> 
> I was running an end of september kernel for nearly two months without
> having panics 3 times a day. The kernel of Nov 23 had these problems. So
> the problem should be located somwhere in these two months.
> 
> Since this may take quite some time (and a lot of kernel and
> worldbuilds), I'll only take it into account if there is a good chance
> that this will reveal the source of the problem.

Unfortunately, that may be the fastest way to track down the culprit.
The only other way would be to write a more aggressive assertion checker
function that validates the integrity of all the mount structures and
sprinkle lots of calls to this function around the kernel.

I also diff'ed the 2003/09/23 and 2003/11/23 versions of the ext2fs code
and didn't see anything suspicious.  That means that either the culprit
change is something subtle in extfs, or it is elsewhere in the kernel.


home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200312121832.hBCIW3eF058641>