From owner-freebsd-current@FreeBSD.ORG Sun Aug 24 18:04:26 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0900F16A4BF for ; Sun, 24 Aug 2003 18:04:26 -0700 (PDT) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id B436043FDD for ; Sun, 24 Aug 2003 18:04:24 -0700 (PDT) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.9/8.12.9) with ESMTP id h7P148rO097711; Sun, 24 Aug 2003 21:04:08 -0400 (EDT) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)h7P148Jh097708; Sun, 24 Aug 2003 21:04:08 -0400 (EDT) Date: Sun, 24 Aug 2003 21:04:08 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Gavin Atkinson In-Reply-To: <20030825011106.L23215-100000@ury.york.ac.uk> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: kuriyama@imgsrc.co.jp cc: current@freebsd.org Subject: Re: sysinstall spec_getpages panic (with VM overtones) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Aug 2003 01:04:26 -0000 On Mon, 25 Aug 2003, Gavin Atkinson wrote: > On Wed, 20 Aug 2003, Robert Watson wrote: > > On Wed, 20 Aug 2003, Gavin Atkinson wrote: > > > _mtx_lock_flags(0,0,c0529513,300,ffffffff) at _mtx_lock_flags+0x43 > > > spec_getpages(cce33b3c,54,0,cce33b2c,0) at spec_getpages+0x26c > > > ffs_getpages(cce33b80,0,c05459de,274,c05c63e0) at ffs_getpages+0x5f6 > > > vnode_pager_getpages(c0bebafc,cce33c70,1,0,cce33c20) at > > > vnode_pager_getpages+0x73 vm_fault(c1259900,819b000,1,0,c12534c0) at > > > vm_fault+0x8e2 trap_pfault(cce33d48,1,819b004,200,819b004) at > > > trap_pfault+0x109 trap(2f,2f,2f,82e533c,0) at trap+0x1fc calltrap() at > > > calltrap+0x5 > > > > > > *c0529513 = "/usr/src/sys/fs/specfs/spec_vnops.c", line 0x300 is line 768: > > > > > > 766 gotreqpage = 0; > > > 767 VM_OBJECT_LOCK(vp->v_object); > > > 768 vm_page_lock_queues(); > > > 769 for (i = 0, toff = 0; i < pcount; i++, toff = nextoff) { > > > > Is it ap->a_vp that's NULL, or vp->v_object that's NULL? vp is > > dereferenced several times before that in the code, so if vp is really > > NULL at line 767, we're probably talking about memory corruption. But if > > vp->v_object is NULL, then it could be we're not creating a VM object > > along some code path. > > Although this panic is 100% reproducible during the initial install > through sysinstall, I have tried hard but can not reproduce this once > the system is installed and running multiuser, even by performing the > same actions within sysinstall. I have I have also tried without success > to get a crash dump of the panic, however after a fair bit of head > scratching it looks from a grep of the source code like the "dumpdev" > loader variable documented in loader(8) is not yet implemented... and as > far as I can tell there is no other way I can get the installer off CD > to generate a dump. > > I'm trying to make a release with extra debugging info, but won't be > able to test this until at least Wednesday or Thursday. What extra > debugging info would be useful? Who would be the best person to discuss > this with? From what kuriyama said, it appears that it is indeed > vp->v_object that is null, so I have added the following to > specfs_vnops.c just before the lock that fails: > > if (vp->v_object == NULL) > panic("vp->v_object is null in %s, rdev=%s", __func__, > devtoname(vp->v_rdev)); > > Hopefully that will help diagnose the cause a little further, but I'm > really working blind here - this is not an area of the kernel I > understand at all. If there is any other debugging info I can provide > that may be useful, I'm happy to have a go. Kuriyama, if you have any > spare time before I am able to do it, maybe you could add the above code > and find out what message it panics with? Alan Cox just made a commit a couple of days ago that seems to resolve the problem for us. Here's the commit message so you can give it a try. alc 2003/08/22 10:50:32 PDT FreeBSD src repository Modified files: sys/fs/specfs spec_vnops.c Log: Use the requested page's object field instead of the vnode's. In some cases, the vnode's object field is not initialized leading to a NULL pointer dereference when the object is locked. Tested by: rwatson Revision Changes Path 1.208 +5 -2 src/sys/fs/specfs/spec_vnops.c