Date: Sun, 26 Jul 1998 21:40:41 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: wollman@khavrinen.lcs.mit.edu (Garrett Wollman) Cc: karl@mcs.net, dswartz@druber.com, current@FreeBSD.ORG Subject: Re: MMAP problems Message-ID: <199807262140.OAA17295@usr01.primenet.com> In-Reply-To: <199807260252.WAA05646@khavrinen.lcs.mit.edu> from "Garrett Wollman" at Jul 25, 98 10:52:34 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> I've been seeing it for several months. > > I believe it to be a coherency problem. The relevant operations here > are: > > 1) A diablo server process appends to a spool file using explicit > I/O. (Note that the file is not opened in O_APPEND mode.) > > 2) A boatload of dnewslink processes simultaneously mmap the pages of > the spool file containing the article in question, suck the article > out of it, and blast it over to the remote feed. > > Here's my particular guess... I think this happens when the dnewslink > processes are reading another, short, article in the last page of the > file, while a diablo server is writing a new article. Somewhere, > there is a race condition in which the kernel has copied the new data > into the buffer, but blocks before it updates the valid length; this > then allows one of the mmaps to succeed, and since that part of the > buffer is marked invalid, it gets zeroed. Then the diablo process > resumes, and marks the end of the buffer valid, although the data it > was writing has just gotten clobbered. > > It looks, from an inspection of the relevant code in ufs_readwrite.c > and ffs_balloc.c, that this cannot happen, because the data are always > copied in last. It does appear that there are potential windows, if > ffs_balloc() blocks, where other processes might see invalid data in > the file through mmap as a result of vnode_pager_setsize() having > already been run, but it does not appear such garbage could possibly > persist and be written back to disk, and I certainly see it directly > on the disk, not just in memory. I think it is a bit more insidious than this. I have been able to repeat this, reliably, using the dbm mmap() code. What is apparently happening is: 1) Open the password file using an access method that leaves it open. 2) Read some pages. 3) Go to sleep for a time. 4) Run a program from cron every 1 minute. The "newsyslog" program is ideal. 5) Wait for the pages associated with the mapped region to be LRU'ed out. 6) Notice that the pages are invalidated on the descriptor, but not from the mmap(). 7) Insert comment /* here is the bug*/. 8) Now wake up and access the password file data again. The data will be refreshed into the supposedly invalidated page, corrupting the file reusing the page contents. 9) See the data written to your crontab, or any other file that happens to have inherited the physical page backing both the mmap'e region, and, incorrectly, the file being corrupted. So it seems to me that this is an uncounted reference problem specific to the mmap code. The problem does not occur with anonymous memory not backed by a vnode (ie: SYSVSHM), under heavy stress. For the case you describe, if in fact it is a bug in the file extension, then a race window is involved. You can close the race window using explicit calls to msync(). I don't think this is the case, however, but you can try adding the calls to the code and see if they fix the problem. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199807262140.OAA17295>