Date: Sun, 26 Jul 1998 09:50:49 -0500 From: Karl Denninger <karl@mcs.net> To: Garrett Wollman <wollman@khavrinen.lcs.mit.edu> Cc: Dan Swartzendruber <dswartz@druber.com>, current@FreeBSD.ORG Subject: Re: MMAP problems Message-ID: <19980726095049.51700@mcs.net> In-Reply-To: <199807260252.WAA05646@khavrinen.lcs.mit.edu>; from Garrett Wollman on Sat, Jul 25, 1998 at 10:52:34PM -0400 References: <19980725155148.43084@mcs.net> <3.0.5.32.19980725172640.00944ac0@mail.kersur.net> <19980725163243.36509@mcs.net> <199807260252.WAA05646@khavrinen.lcs.mit.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Jul 25, 1998 at 10:52:34PM -0400, Garrett Wollman wrote: > <<On Sat, 25 Jul 1998 16:32:43 -0500, Karl Denninger <karl@mcs.net> said: > > > I can verify that CAM is not related to this; it happens with NON-CAM > > kernels as well. > > I've been seeing it for several months. > > I believe it to be a coherency problem. The relevant operations here > are: > > 1) A diablo server process appends to a spool file using explicit > I/O. (Note that the file is not opened in O_APPEND mode.) Yep. Do you have any kind of guess as to whether opening the file O_APPEND would be legit (and would it fix this?) I don't THINK the server process ever "backs up", so this *should* be ok, but I don't want to make that change without having a "better than a guess" shot at it. > 2) A boatload of dnewslink processes simultaneously mmap the pages of > the spool file containing the article in question, suck the article > out of it, and blast it over to the remote feed. Yep. That's the basic model. Diablo beats the shit out of MMAP and I/O; the code is very clever in trying to avoid unnecessary I/O... > Here's my particular guess... I think this happens when the dnewslink > processes are reading another, short, article in the last page of the > file, while a diablo server is writing a new article. Somewhere, > there is a race condition in which the kernel has copied the new data > into the buffer, but blocks before it updates the valid length; this > then allows one of the mmaps to succeed, and since that part of the > buffer is marked invalid, it gets zeroed. Then the diablo process > resumes, and marks the end of the buffer valid, although the data it > was writing has just gotten clobbered. Hmmm.... why would dnntplink not mmap the file readonly though (and wouldn't this solve the problem)? > It looks, from an inspection of the relevant code in ufs_readwrite.c > and ffs_balloc.c, that this cannot happen, because the data are always > copied in last. It does appear that there are potential windows, if > ffs_balloc() blocks, where other processes might see invalid data in > the file through mmap as a result of vnode_pager_setsize() having > already been run, but it does not appear such garbage could possibly > persist and be written back to disk, and I certainly see it directly > on the disk, not just in memory. > > -GAWollman Yep. After about 6 hours of pouring over the code last night (literally and figuratively :-) this is what I think is going on as well. And I can confirm that the trash IS being written to disk; its definitely there on stable storage when you go look for it later. The data which gets written is usually a block of zeros, but it may not be; it can also be random trash. Its also not always one block (it could be more than one), but it IS always, at least from what I'm seeing here, a multiple of 512 bytes (disk blocksize). -- -- Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin http://www.mcs.net/ | T1's from $600 monthly / All Lines K56Flex/DOV | NEW! Corporate ISDN Prices dropped by up to 50%! Voice: [+1 312 803-MCS1 x219]| EXCLUSIVE NEW FEATURE ON ALL PERSONAL ACCOUNTS Fax: [+1 312 803-4929] | *SPAMBLOCK* Technology now included at no cost To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980726095049.51700>