From owner-freebsd-current Mon Mar 6 08:19:46 1995 Return-Path: current-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id IAA09699 for current-outgoing; Mon, 6 Mar 1995 08:19:46 -0800 Received: from sbstark.cs.sunysb.edu (sbstark.cs.sunysb.edu [130.245.1.47]) by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id IAA09692 for ; Mon, 6 Mar 1995 08:19:44 -0800 Received: from starkhome.UUCP (root@localhost) by sbstark.cs.sunysb.edu (8.6.9/8.6.9) with UUCP id LAA18876 for current@freebsd.org; Mon, 6 Mar 1995 11:19:36 -0500 Received: by starkhome.cs.sunysb.edu (8.6.10/1.34) id LAA04199; Mon, 6 Mar 1995 11:17:46 -0500 Date: Mon, 6 Mar 1995 11:17:46 -0500 From: starkhome!gene@sbstark.cs.sunysb.edu (Gene Stark) Message-Id: <199503061617.LAA04199@starkhome.cs.sunysb.edu> To: davidg@Root.COM CC: current@FreeBSD.org, dyson@Root.COM In-reply-to: David Greenman's message of Mon, 06 Mar 1995 07:34:32 -0800 <199503061534.HAA00614@corbin.Root.COM> Subject: Page fault panics during make world in -current Sender: current-owner@FreeBSD.org Precedence: bulk > The code in vfs_bio.c is quite complex. John and I have each gone through >this several times trying to find problems like you've mentioned. We're pretty >sure that the page in question is always made 'busy' or 'bmapped' before any >calls to VM_WAIT (or any other sleep) could otherwise lose the page. I'm not >saying that we might not have missed something...but we have looked at this >specific potential problem more than once. The object itself can't go away >because a reference is held to it. OK, I understand, but the current instability of the system seems to indicate some sort of subtle problem, so I figure having a fresh eye take a look at the code might stand a chance of finding something. I hope you'll pardon me if I "find" stuff that isn't a problem, as the assumptions/invariants, etc. that are inherent in this code take awhile to flesh out by reading the code over and over. I am still concerned about line 1046 of vfs_bio.c, though. At line 1031, m is determined to be either invalid or busy. At line 1046 there is a possibility of sleeping in the VM_WAIT. If m is invalid, then I don't think there is anything stopping a pager from replacing m in the object with another page during the sleep, so that when we wake up again, m isn't a reference to the proper page in this object any more. If m was busy, of course, this can't happen, because the pagers respect the busy flags and don't replace the pages in this case. I have the feeling a good test to exercise some of these potential problems would be to mmap() a file, then start accessing it via the mapped addresses, concurrently with another process that repeatedly truncates and rewrites it. Do you have a test like this? - Gene