Date: Sun, 16 Aug 1998 14:31:59 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: dg@root.com Cc: tlambert@primenet.com, current@FreeBSD.ORG, karl@mcs.net Subject: Re: Better VM patches (was Tentative fix for VM bug) Message-ID: <199808161431.HAA16709@usr08.primenet.com> In-Reply-To: <199808161315.GAA00184@implode.root.com> from "David Greenman" at Aug 16, 98 06:15:30 am
next in thread | previous in thread | raw e-mail | index | archive | help
> >John also suggested another change for detecting a case where page > >orphans can be created. in vm_page.c, it's possible that an object > >entry in the page insert function will overwrite an existing entry; > >I added a DIAGNOSTIC panic to catch this when it happens. > > You mean that there can be multiple pages at the same offset? It would > be bad if that happend, but I'm skeptical that it actually does. To elaborate: I think that the behaviour I am seeing in the "mmap'ed file contents, on a page boundary, written to another file" case is a result of a single page being in multiple maps. The code that I have provided so far fails to address this bug. I think a rather elaborate diagnostic (which I am constructing on my home machine) is about the only chance of detecting something like this; each page allocation and deallocation must be resource tracked, at great expense. A bug like this is really the only explanation for non-zeroed page contents from one file appearing in another file; there are other symptoms, but the are attributable to other bugs. This is the only bug that satisfies this case and all the others, all at the same time. Left only the improbable, however unlikely... well, Arthur Conan Doyle fans know the rest... Initially, I thought this was 486dx4 specific (per Julian); however, recent information has come to light that indicates that this is a general problem (ie: it is not a noise problem on my hardware); that is, someone has repeated the problem on Cyrix and Pentium hardware. Right now, I am concentrating on the mmap code (the password file is mmap'ed, and that is generally what shows up in the crontab as the corrupt page contents) and the TLB shootdown code (which could also account for the problem, though it would be a really bizare set of circumstances that would be required to lead to this...). Basically, it's code review time... kill the obvious races and see what's left. 8-(. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808161431.HAA16709>