From owner-freebsd-current  Sun Mar  5 18:24:44 1995
Return-Path: current-owner
Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id SAA18128 for current-outgoing; Sun, 5 Mar 1995 18:24:44 -0800
Received: from sbstark.cs.sunysb.edu (sbstark.cs.sunysb.edu [130.245.1.47]) by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id SAA18122 for <current@freebsd.org>; Sun, 5 Mar 1995 18:24:43 -0800
Received: from starkhome.UUCP (root@localhost) by sbstark.cs.sunysb.edu (8.6.9/8.6.9) with UUCP id VAA17917 for current@freebsd.org; Sun, 5 Mar 1995 21:24:32 -0500
Received: by starkhome.cs.sunysb.edu (8.6.10/1.34)
	id VAA01305; Sun, 5 Mar 1995 21:20:14 -0500
Date: Sun, 5 Mar 1995 21:20:14 -0500
From: starkhome!gene@sbstark.cs.sunysb.edu (Gene Stark)
Message-Id: <199503060220.VAA01305@starkhome.cs.sunysb.edu>
To: current@FreeBSD.org
Subject: Page fault panics during make world in -current
Sender: current-owner@FreeBSD.org
Precedence: bulk

As I mentioned in a previous message to "current", I am getting page fault
panics with a -current kernel during a make world.  Specifically, this seems
to occur during the library install phase.  I have also had problems during
shutdown and reboot, though I don't know if it's the same thing.

I spent a couple of hours looking over a crash dump just now, and here
is what I am seeing.  A call originates from vm_fault_additional_pages()
via vnode_pager_haspage() to perform VM paging I/O on a file at offset
0x24000, which with a filesystem bsize of 8192 corresponds to logical block
number 18.  This being beyond the number of direct blocks (12) in the file,
the first indirect block is required.  This indirect block has logical block
number -12, according to the coding scheme being used for the file metadata.
The call goes through ufs_bmap() to ufs_bmaparray(), where it is determined
that the desired data lives at disk address 65896, so a call is made
to getblk() at line 180.  In getblk(), the flag "doingvmio" is set to 1,
and a call is made to allocbuf(), which apparently then goes bananas,
eventually triggering a panic, which may actually occur from with the
call to vm_page_lookup() at line 1005.

It appears to me that the problem is the negative logical block number
is filtering down to the call to allocbuf().  From the way the logical
block numbers are used in allocbuf() to compute offsets into VM objects
(which are unsigned, at least they used to be, and it is hard to imagine
things changing that substantially), it looks like allocbuf() is not
supposed to be called with a negative logical block numbers and vmio=1.
Is this right?

My current theory is that somehow the setting of doingvmio to 1 in getblk()
is wrong, or some sort of synchronization problem is causing this setting
to be inappropriate before the call to allocbuf() is made.  It could be
associated from paging from/to a shared library that is being removed or
updated.

I can provide more information from the crash dump, if that will help.

							- Gene