Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 Jul 1998 19:42:29 -0500
From:      Karl Denninger  <karl@mcs.net>
To:        Matthew Dillon <dillon@backplane.com>
Cc:        current@FreeBSD.ORG
Subject:   Diablo corruption on the filesystem
Message-ID:  <19980726194229.43215@mcs.net>
In-Reply-To: <199807270026.RAA09393@apollo.backplane.com>; from Matthew Dillon on Sun, Jul 26, 1998 at 05:26:11PM -0700
References:  <199807270026.RAA09393@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jul 26, 1998 at 05:26:11PM -0700, Matthew Dillon wrote:
> :Something funny is going on here.
> :
> :Removing the realtime, but NOT putting a Q1 on the feeds did NOT fix it.
> :Specifically, IMMEDIATELY on startup of a new set of dnewslink processes
> :I'd get a whole batch of errors on files that were still open for write in
> :the diablo processes at the time.
> :
> :The q1 seems to prevent that from happening (diablo has moved on to a new
> :file before dnewslink starts up against the old queue files).
> :
> :--
> :-- 
> :Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin
> :http://www.mcs.net/          | T1's from $600 monthly / All Lines K56Flex/DOV
> 
>     .... or the q1 is resulting in a long enough delay that Diablo's write
>     position is well beyond the point where dnewslink picks up the files,
>     and so is (potentially) beyond the point where a last-block-is-fragment
>     problem would mess up mmap().

That's not the whole problem.

>     It sure sounds to me like a mmap()/file-fragment-block-allocation
>     inconsistancy that occurs near the 'end' of the file when one process has
>     the end-portion of the file mmap'd shared+ro and another is actively
>     appending to the file.

Check THIS out.

I turned off USE_PCOMMIT_SHM, USE_PCOMMIT_RW_MAP, USE_KP_RW_MAP and
DO_COMMIT_POSTCACHE and recompiled diablo (but did NOT reinstall dnewslink,
if that matters).

This should shut down as much of the MMAPping as I can in diablo itself.

Now, with q1 set on the feeds, the problem appears to be completely GONE!

With those options ON, the error rate decreased but there was still
corruption happening.  

Now there is not.

I tried the recompile last night with REALTIME still enabled on a hunch, and 
while that cut the error rate down, I was still getting errors - so I assumed 
that whatever I had done with the options had been ineffective and I was 
seeing instead an artifact of lower-than-normal news rates (since its the 
weekend).

However, with the options off and Q1 set, I'm not getting any errors (20
minutes now with no errors logged).

Something significant happend when I changed the behavior of diablo
internally.

Does this shed any light on the issue?  What have I actually done in terms
of the calls made to the mmap routines by disabling these options?

If it runs error-free for an hour or so I'm going to try removing the q1
and see if the problem reappears (leaving the modified diablo program
in service).

--
-- 
Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin
http://www.mcs.net/          | T1's from $600 monthly / All Lines K56Flex/DOV
			     | NEW! Corporate ISDN Prices dropped by up to 50%!
Voice: [+1 312 803-MCS1 x219]| EXCLUSIVE NEW FEATURE ON ALL PERSONAL ACCOUNTS
Fax:   [+1 312 803-4929]     | *SPAMBLOCK* Technology now included at no cost

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980726194229.43215>