From owner-freebsd-current@FreeBSD.ORG  Sun Nov 30 18:35:57 2003
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 16E7716A4CE
	for <current@FreeBSD.org>; Sun, 30 Nov 2003 18:35:57 -0800 (PST)
Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 82B2343FE1
	for <current@FreeBSD.org>; Sun, 30 Nov 2003 18:35:54 -0800 (PST)
	(envelope-from truckman@FreeBSD.org)
Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2])
	by gw.catspoiler.org (8.12.9p2/8.12.9) with ESMTP id hB12ZieF025215;
	Sun, 30 Nov 2003 18:35:48 -0800 (PST)
	(envelope-from truckman@FreeBSD.org)
Message-Id: <200312010235.hB12ZieF025215@gw.catspoiler.org>
Date: Sun, 30 Nov 2003 18:35:44 -0800 (PST)
From: Don Lewis <truckman@FreeBSD.org>
To: shoesoft@gmx.net
In-Reply-To: <1070238961.5827.17.camel@shoeserv.freebsd>
MIME-Version: 1.0
Content-Type: TEXT/plain; charset=us-ascii
cc: current@FreeBSD.org
Subject: Re: 5.2-BETA panic: page fault
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 01 Dec 2003 02:35:57 -0000

On  1 Dec, Stefan Ehmann wrote:
> On Mon, 2003-12-01 at 01:10, Don Lewis wrote:
>> Can you reproduce this problem without bktr?
>> 
> <snip>
>> You are getting a double panic, with the second happening during the
>> file system sync.  The code seems to be be tripping over the same mount
>> list entry each time.  Maybe the mount list is getting corrupted.  Are
>> you using amd?  Print *lkp in the lockmgr() stack frame.
>> 
>> 
>> You might want to add
>> 	KASSERT(mp->mnt_lock.lk_interlock !=NULL, "vfs_busy: NULL mount
>>         pointer interlock");
>> at the top of vfs_busy() and right before the lockmgr() call.
> 
> No, I'm not using amd.
> 
> (kgdb) print *lkp
> $1 = {lk_interlock = 0x0, lk_flags = 0, lk_sharecount = 0, lk_waitcount
> = 0, 
>   lk_exclusivecount = 0, lk_prio = 0, lk_wmesg = 0x0, lk_timo = 0, 
>   lk_lockholder = 0x0, lk_newlock = 0x0}
> 
> This is indeed just NULLs.

Not good.  Nothing should be writing to lk_interlock once it has been
initialized.  Either something is stomping on an active struct mount,
we're still using it after it has been put on the free list, or
dp->v_mountedhere is pointing somewhere bogus.  I don't suspect the
latter because the second panic() in the sync() code doesn't follow this
path to get to the struct mount.

> I haven't tried without bktr yet but I hope I'll have time for that (and
> the KASSERT) tomorrow.
> 
> The panic only seems to happen when accessing my read-only mounted ext2
> partition. Today I tried not to access any data there and uptime is
> 14h30min now. The panic always happened after a few hours. So this is
> probably the core of the problem.

That sounds like a possibility.  I might be able to try that here when I
have some idle time on my -CURRENT box.

Can you print *dp->v_mountedhere in the lookup() frame?  That should
show the mount point information and might show if anything else in
struct mount is damaged.