Date: Sun, 26 Jul 1998 19:42:29 -0500 From: Karl Denninger <karl@mcs.net> To: Matthew Dillon <dillon@backplane.com> Cc: current@FreeBSD.ORG Subject: Diablo corruption on the filesystem Message-ID: <19980726194229.43215@mcs.net> In-Reply-To: <199807270026.RAA09393@apollo.backplane.com>; from Matthew Dillon on Sun, Jul 26, 1998 at 05:26:11PM -0700 References: <199807270026.RAA09393@apollo.backplane.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jul 26, 1998 at 05:26:11PM -0700, Matthew Dillon wrote: > :Something funny is going on here. > : > :Removing the realtime, but NOT putting a Q1 on the feeds did NOT fix it. > :Specifically, IMMEDIATELY on startup of a new set of dnewslink processes > :I'd get a whole batch of errors on files that were still open for write in > :the diablo processes at the time. > : > :The q1 seems to prevent that from happening (diablo has moved on to a new > :file before dnewslink starts up against the old queue files). > : > :-- > :-- > :Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin > :http://www.mcs.net/ | T1's from $600 monthly / All Lines K56Flex/DOV > > .... or the q1 is resulting in a long enough delay that Diablo's write > position is well beyond the point where dnewslink picks up the files, > and so is (potentially) beyond the point where a last-block-is-fragment > problem would mess up mmap(). That's not the whole problem. > It sure sounds to me like a mmap()/file-fragment-block-allocation > inconsistancy that occurs near the 'end' of the file when one process has > the end-portion of the file mmap'd shared+ro and another is actively > appending to the file. Check THIS out. I turned off USE_PCOMMIT_SHM, USE_PCOMMIT_RW_MAP, USE_KP_RW_MAP and DO_COMMIT_POSTCACHE and recompiled diablo (but did NOT reinstall dnewslink, if that matters). This should shut down as much of the MMAPping as I can in diablo itself. Now, with q1 set on the feeds, the problem appears to be completely GONE! With those options ON, the error rate decreased but there was still corruption happening. Now there is not. I tried the recompile last night with REALTIME still enabled on a hunch, and while that cut the error rate down, I was still getting errors - so I assumed that whatever I had done with the options had been ineffective and I was seeing instead an artifact of lower-than-normal news rates (since its the weekend). However, with the options off and Q1 set, I'm not getting any errors (20 minutes now with no errors logged). Something significant happend when I changed the behavior of diablo internally. Does this shed any light on the issue? What have I actually done in terms of the calls made to the mmap routines by disabling these options? If it runs error-free for an hour or so I'm going to try removing the q1 and see if the problem reappears (leaving the modified diablo program in service). -- -- Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin http://www.mcs.net/ | T1's from $600 monthly / All Lines K56Flex/DOV | NEW! Corporate ISDN Prices dropped by up to 50%! Voice: [+1 312 803-MCS1 x219]| EXCLUSIVE NEW FEATURE ON ALL PERSONAL ACCOUNTS Fax: [+1 312 803-4929] | *SPAMBLOCK* Technology now included at no cost To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980726194229.43215>