From owner-freebsd-current Sun Jul 26 07:51:19 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id HAA00536 for freebsd-current-outgoing; Sun, 26 Jul 1998 07:51:19 -0700 (PDT) (envelope-from owner-freebsd-current@FreeBSD.ORG) Received: from Kitten.mcs.com (Kitten.mcs.com [192.160.127.90]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id HAA00531 for ; Sun, 26 Jul 1998 07:51:17 -0700 (PDT) (envelope-from karl@Mars.mcs.net) Received: from Mars.mcs.net (karl@Mars.mcs.net [192.160.127.85]) by Kitten.mcs.com (8.8.7/8.8.2) with ESMTP id JAA22449; Sun, 26 Jul 1998 09:50:50 -0500 (CDT) Received: (from karl@localhost) by Mars.mcs.net (8.8.7/8.8.2) id JAA10307; Sun, 26 Jul 1998 09:50:50 -0500 (CDT) Message-ID: <19980726095049.51700@mcs.net> Date: Sun, 26 Jul 1998 09:50:49 -0500 From: Karl Denninger To: Garrett Wollman Cc: Dan Swartzendruber , current@FreeBSD.ORG Subject: Re: MMAP problems References: <19980725155148.43084@mcs.net> <3.0.5.32.19980725172640.00944ac0@mail.kersur.net> <19980725163243.36509@mcs.net> <199807260252.WAA05646@khavrinen.lcs.mit.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.84 In-Reply-To: <199807260252.WAA05646@khavrinen.lcs.mit.edu>; from Garrett Wollman on Sat, Jul 25, 1998 at 10:52:34PM -0400 Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sat, Jul 25, 1998 at 10:52:34PM -0400, Garrett Wollman wrote: > < said: > > > I can verify that CAM is not related to this; it happens with NON-CAM > > kernels as well. > > I've been seeing it for several months. > > I believe it to be a coherency problem. The relevant operations here > are: > > 1) A diablo server process appends to a spool file using explicit > I/O. (Note that the file is not opened in O_APPEND mode.) Yep. Do you have any kind of guess as to whether opening the file O_APPEND would be legit (and would it fix this?) I don't THINK the server process ever "backs up", so this *should* be ok, but I don't want to make that change without having a "better than a guess" shot at it. > 2) A boatload of dnewslink processes simultaneously mmap the pages of > the spool file containing the article in question, suck the article > out of it, and blast it over to the remote feed. Yep. That's the basic model. Diablo beats the shit out of MMAP and I/O; the code is very clever in trying to avoid unnecessary I/O... > Here's my particular guess... I think this happens when the dnewslink > processes are reading another, short, article in the last page of the > file, while a diablo server is writing a new article. Somewhere, > there is a race condition in which the kernel has copied the new data > into the buffer, but blocks before it updates the valid length; this > then allows one of the mmaps to succeed, and since that part of the > buffer is marked invalid, it gets zeroed. Then the diablo process > resumes, and marks the end of the buffer valid, although the data it > was writing has just gotten clobbered. Hmmm.... why would dnntplink not mmap the file readonly though (and wouldn't this solve the problem)? > It looks, from an inspection of the relevant code in ufs_readwrite.c > and ffs_balloc.c, that this cannot happen, because the data are always > copied in last. It does appear that there are potential windows, if > ffs_balloc() blocks, where other processes might see invalid data in > the file through mmap as a result of vnode_pager_setsize() having > already been run, but it does not appear such garbage could possibly > persist and be written back to disk, and I certainly see it directly > on the disk, not just in memory. > > -GAWollman Yep. After about 6 hours of pouring over the code last night (literally and figuratively :-) this is what I think is going on as well. And I can confirm that the trash IS being written to disk; its definitely there on stable storage when you go look for it later. The data which gets written is usually a block of zeros, but it may not be; it can also be random trash. Its also not always one block (it could be more than one), but it IS always, at least from what I'm seeing here, a multiple of 512 bytes (disk blocksize). -- -- Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin http://www.mcs.net/ | T1's from $600 monthly / All Lines K56Flex/DOV | NEW! Corporate ISDN Prices dropped by up to 50%! Voice: [+1 312 803-MCS1 x219]| EXCLUSIVE NEW FEATURE ON ALL PERSONAL ACCOUNTS Fax: [+1 312 803-4929] | *SPAMBLOCK* Technology now included at no cost To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message