From owner-freebsd-current  Sun Aug 16 22:02:06 1998
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id WAA15203
          for freebsd-current-outgoing; Sun, 16 Aug 1998 22:02:06 -0700 (PDT)
          (envelope-from owner-freebsd-current@FreeBSD.ORG)
Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id WAA15172
          for <current@FreeBSD.ORG>; Sun, 16 Aug 1998 22:02:02 -0700 (PDT)
          (envelope-from tlambert@usr09.primenet.com)
Received: (from daemon@localhost)
	by smtp04.primenet.com (8.8.8/8.8.8) id WAA17008;
	Sun, 16 Aug 1998 22:01:28 -0700 (MST)
Received: from usr09.primenet.com(206.165.6.209)
 via SMTP by smtp04.primenet.com, id smtpd016966; Sun Aug 16 22:01:25 1998
Received: (from tlambert@localhost)
	by usr09.primenet.com (8.8.5/8.8.5) id WAA18680;
	Sun, 16 Aug 1998 22:01:20 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199808170501.WAA18680@usr09.primenet.com>
Subject: Re: Better VM patches (was Tentative fix for VM bug)
To: dg@root.com
Date: Mon, 17 Aug 1998 05:01:20 +0000 (GMT)
Cc: tlambert@primenet.com, current@FreeBSD.ORG, karl@mcs.net
In-Reply-To: <199808161506.IAA02398@implode.root.com> from "David Greenman" at Aug 16, 98 08:05:59 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> >To elaborate: I think that the behaviour I am seeing in the "mmap'ed
> >file contents, on a page boundary, written to another file" case is a
> >result of a single page being in multiple maps.
> 
>    Uh, what does "single page being in multiple maps" mean exactly? What map?
> Pages belong to objects. A page can belong to only one object (there's only
> one set of list pointers, so nothing else is possible. :-)).

This precisely what is happening in the cases I have observed in
deployed machines:

	The contents of the cron tab are being overwritten with the
	contents of a dbm file which the dbm routines mmap'ed.  The
	new content of the file begins on a page boundary beginning.

> If you see some
> of one file show up in another, then this suggests to me that a bzero isn't
> happening when it needs to, resulting in stale cached file data being
> considered valid.

This would be true if the file data was intact up to the EOF, then
contained the cached data.  This is not what is happening.  What is
happening suggests that when mmap'ed backing store is discarded,
an object alias remains after the discard, and the physical page
is reused while it is still referenced.


>    I have a suggestion. Let's not throw out random guesses about what may or
> may not be a problem. Let's actually understand the issue thoroghly, come up
> with a fix, and then tell people all about it.

The case I am worried about with the setsize fix is the case where
the file was mmap'ed.  The file did not end on an even page boundary.
The file is extended, either below or exactly on a page boundary.
The first "if" in "setsize" is true because the file length was
erroneously saved in the object.  A subsequent read reference on the
mmap'ed section, which is believed to be swapped out, results in a
page reactivation.  The perfectly valid data (written via normal
file I/O) in the extended area is lost.  So the setsize code fixes
a real bug, even if it breaks an abstraction.  The race window from
my comment refers to a read completion interrupt from a disk controller
being processed in the time between the bogus value being set in the
alloc, and the value being corrected.


The XXX comment I pointed to is an indication that orphaned objects
can and do exist.  This is a seperate problem, and can't result in
the "mmap'ed file contents overwrite valid file contents" corruption
that got me interested in this in the first place.  Neither could
John's orphaned object fix for invalidation of good pages, and my
order fix on the protection before the invalidation.

All that said, I think that my fix does not resolve all of the problems,
only some of them.  The "backup one if you can" kludge I originally
posted was more diagnostic than anything else; it was very informative
to me that (1) it didn't result in more "freeing free page" panics, and
(2) it didn't help the mmap bug in the news server, which I think is a
more repeatable version of the crontab corruption bug I am suffering.

So anyway, I hope I've convinced you that there is actually an error
in the allocate code setting the size to a page boundary; the valid = 0
bug is so obvious (once John Dyson pointed it out after looking at my
"backup one" patch 8-)) that it doesn't need more justification.

Anyway, I'm currently looking for similar order of operation things;
if I can come up with patches that fix Karl Denniger's news server,
I'm pretty convinced my problem will go away as well.

If you would prefer I target my patches instead of posting to the
list, I'll post asking for people who have problems and restrict the
distribution.  Should I do this?  I'd really prefer that diagnostic
patches get the widest possible audience, especially if they make
things worse.  Getting a limited set of positive responses is the
worst thing that could happen if there is a negative effect that
goes undiscovered.  8-(.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message