Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 Jul 1998 21:40:41 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        wollman@khavrinen.lcs.mit.edu (Garrett Wollman)
Cc:        karl@mcs.net, dswartz@druber.com, current@FreeBSD.ORG
Subject:   Re: MMAP problems
Message-ID:  <199807262140.OAA17295@usr01.primenet.com>
In-Reply-To: <199807260252.WAA05646@khavrinen.lcs.mit.edu> from "Garrett Wollman" at Jul 25, 98 10:52:34 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> I've been seeing it for several months.
> 
> I believe it to be a coherency problem.  The relevant operations here
> are:
> 
> 1) A diablo server process appends to a spool file using explicit
> I/O.  (Note that the file is not opened in O_APPEND mode.)
> 
> 2) A boatload of dnewslink processes simultaneously mmap the pages of
> the spool file containing the article in question, suck the article
> out of it, and blast it over to the remote feed.
> 
> Here's my particular guess...  I think this happens when the dnewslink
> processes are reading another, short, article in the last page of the
> file, while a diablo server is writing a new article.  Somewhere,
> there is a race condition in which the kernel has copied the new data
> into the buffer, but blocks before it updates the valid length; this
> then allows one of the mmaps to succeed, and since that part of the
> buffer is marked invalid, it gets zeroed.  Then the diablo process
> resumes, and marks the end of the buffer valid, although the data it
> was writing has just gotten clobbered.
> 
> It looks, from an inspection of the relevant code in ufs_readwrite.c
> and ffs_balloc.c, that this cannot happen, because the data are always
> copied in last.  It does appear that there are potential windows, if
> ffs_balloc() blocks, where other processes might see invalid data in
> the file through mmap as a result of vnode_pager_setsize() having
> already been run, but it does not appear such garbage could possibly
> persist and be written back to disk, and I certainly see it directly
> on the disk, not just in memory.


I think it is a bit more insidious than this.

I have been able to repeat this, reliably, using the dbm mmap()
code.  What is apparently happening is:

1)	Open the password file using an access method that leaves
	it open.

2)	Read some pages.

3)	Go to sleep for a time.

4)	Run a program from cron every 1 minute.  The "newsyslog"
	program is ideal.

5)	Wait for the pages associated with the mapped region to be
	LRU'ed out.

6)	Notice that the pages are invalidated on the descriptor, but
	not from the mmap().

7)	Insert comment /* here is the bug*/.

8)	Now wake up and access the password file data again.  The
	data will be refreshed into the supposedly invalidated
	page, corrupting the file reusing the page contents.

9)	See the data written to your crontab, or any other file
	that happens to have inherited the physical page backing
	both the mmap'e region, and, incorrectly, the file being
	corrupted.

So it seems to me that this is an uncounted reference problem
specific to the mmap code.

The problem does not occur with anonymous memory not backed by a
vnode (ie: SYSVSHM), under heavy stress.


For the case you describe, if in fact it is a bug in the file
extension, then a race window is involved.  You can close the
race window using explicit calls to msync().  I don't think this
is the case, however, but you can try adding the calls to the
code and see if they fix the problem.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199807262140.OAA17295>