From owner-freebsd-current Sun Jul 26 16:16:42 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id QAA25272 for freebsd-current-outgoing; Sun, 26 Jul 1998 16:16:42 -0700 (PDT) (envelope-from owner-freebsd-current@FreeBSD.ORG) Received: from Kitten.mcs.com (Kitten.mcs.com [192.160.127.90]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id QAA25263 for ; Sun, 26 Jul 1998 16:16:31 -0700 (PDT) (envelope-from karl@Mars.mcs.net) Received: from Mars.mcs.net (karl@Mars.mcs.net [192.160.127.85]) by Kitten.mcs.com (8.8.7/8.8.2) with ESMTP id SAA01589; Sun, 26 Jul 1998 18:15:56 -0500 (CDT) Received: (from karl@localhost) by Mars.mcs.net (8.8.7/8.8.2) id SAA14765; Sun, 26 Jul 1998 18:15:55 -0500 (CDT) Message-ID: <19980726181555.49644@mcs.net> Date: Sun, 26 Jul 1998 18:15:55 -0500 From: Karl Denninger To: Terry Lambert Cc: Mike Smith , wollman@khavrinen.lcs.mit.edu, dswartz@druber.com, current@FreeBSD.ORG, dillon@best.net Subject: Re: MMAP problems References: <199807261647.JAA10667@antipodes.cdrom.com> <199807262228.PAA18917@usr01.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.84 In-Reply-To: <199807262228.PAA18917@usr01.primenet.com>; from Terry Lambert on Sun, Jul 26, 1998 at 10:28:08PM +0000 Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sun, Jul 26, 1998 at 10:28:08PM +0000, Terry Lambert wrote: > > > The data which gets written is usually a block of zeros, but it may not be; > > > it can also be random trash. Its also not always one block (it could be > > > more than one), but it IS always, at least from what I'm seeing here, a > > > multiple of 512 bytes (disk blocksize). > > > > The significant question in light of Garrett's description seem to be > > whether the trash that's written is actually being written by the > > process in error because that's what it got from a previous read, or > > whether the process is actually writing the right stuff and it's being > > corrupted on the way down. > > See other postings. > > Because the corrupt data can be non-zero, I do not believe Garrett's > explanation is the correct one; instead, I believe the same page is > being pointed to by two mappings at the same time because I don't > believe that mmap() references are being revoked correctly. Hmmm... Yes, I can confirm (for certain) that the corrupt data is not always zero. It FREQUENTLY is zero, but not always. If it is non-zero it generally is identifyable as a chunk of another message (unfortunately I haven't gotten a HEADER yet; if I do, I will be able to track down where the chunk of data actually came from). > Note that I have seen the bug I am describing on both 2.2.6 and 3.0 > systems. These are production systems that open and hild open the > password file a long time, and which access the crontabe with an > annoyingly (and probably undesirably) high frequency. This results > in corrupt crontabs. On the other hand, corrupt crontabs are much > better than silently corrupted user data... 8-(. This is particularly bad, since one of the potential "fixes" Matt has given me is to back down to 2.2.6. Of course, if this problem is IN 2.2.6, then backing down on that machine will do a big nothing. I suspect the real culprit is that I'm running basically all my feeds in "realtime" mode - if I was delaying by 10 minutes, diablo would never be writing to the same file that the feeder program was reading at a given time (ie: the file that was open for MMAP would never be open for write at the same time). I *can* confirm that the files where the corruption is being seen absolutely ARE open for write when the errors occur; I've managed to catch the system "in the act" doing this, and have found the file open at the time. I'm going to put a "q1" flag on all the feeds (which should effectively force the system to not "chase its tail" so effectively) and see if the problem goes away. Doing that should prevent the software from attempting to read via mmap in the last-written (by write(2)) page. If you're right then this should make the problem disappear. Then the question becomes how to fix it. - -- Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin http://www.mcs.net/ | T1's from $600 monthly / All Lines K56Flex/DOV | NEW! Corporate ISDN Prices dropped by up to 50%! Voice: [+1 312 803-MCS1 x219]| EXCLUSIVE NEW FEATURE ON ALL PERSONAL ACCOUNTS Fax: [+1 312 803-4929] | *SPAMBLOCK* Technology now included at no cost To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message