From owner-freebsd-current Sun Aug 16 22:38:31 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id WAA20088 for freebsd-current-outgoing; Sun, 16 Aug 1998 22:38:31 -0700 (PDT) (envelope-from owner-freebsd-current@FreeBSD.ORG) Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id WAA20081 for ; Sun, 16 Aug 1998 22:38:29 -0700 (PDT) (envelope-from tlambert@usr09.primenet.com) Received: (from daemon@localhost) by smtp01.primenet.com (8.8.8/8.8.8) id WAA01440; Sun, 16 Aug 1998 22:37:55 -0700 (MST) Received: from usr09.primenet.com(206.165.6.209) via SMTP by smtp01.primenet.com, id smtpd001399; Sun Aug 16 22:37:46 1998 Received: (from tlambert@localhost) by usr09.primenet.com (8.8.5/8.8.5) id WAA19870; Sun, 16 Aug 1998 22:37:37 -0700 (MST) From: Terry Lambert Message-Id: <199808170537.WAA19870@usr09.primenet.com> Subject: Re: Better VM patches (was Tentative fix for VM bug) To: nate@mt.sri.com (Nate Williams) Date: Mon, 17 Aug 1998 05:37:37 +0000 (GMT) Cc: dg@root.com, tlambert@primenet.com, current@FreeBSD.ORG, karl@mcs.net In-Reply-To: <199808170217.UAA04040@mt.sri.com> from "Nate Williams" at Aug 16, 98 08:17:30 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > I have a suggestion. Let's not throw out random guesses about what may or > > may not be a problem. Let's actually understand the issue thoroghly, come up > > with a fix, and then tell people all about it. > > Actually, I'm with Terry here. I think throwing out random guesses is a > *much* better solution than what's occurred so far. At least this way > folks have a clue about what *might* be going on, and some of the > 'random guesses' may trigger someone's mind. Actually this is too adversarial. There is a real problem with the vnode_pager_alloc; it should *NOT* set the actual size of the backing file to something other than the actual size of the backing file. I think I cleared up the misunderstanding caused by my inability to communicate *why* this was a problem in my initial post. My "wild guess" that fits the most problems is that there is a page that is multiply referenced (or an object; a page makes more sense to me becuase of the symptoms I've seen). This is a read-cache bug (which is why I initially asked that someone with the SIG-11 or the zeroed-page bugs compile their kernel NO_SWAPPING). > The lack of progress on these bugs from the kernel hackers until Terry > makes up an 'educated guess' seems to be a good motivator. :) ;) I think the problems are more severe than are generally thought, but are very infrequent. I'm pretty sure that, until my last post, that I had given David the impression that the file corruption I was seeing was partial page corruption of a file that ended before a page boundary. In fact, I was seeing corruption beginning on a page boundary, and extending for 4k (or the end of the file, whichever came first). I don't think anyone has been very good at communicating these bugs, or their severity ("What idiot would extend a file that has been mmap'ed without redoing the mapping?", etc.). David's patches for the NFS problem were well thought out. I don't think he needed me poking him to find them. 8-). The reason I did the backup-one patch at all was that I was looking for a panacea; a multiply referenced page, however it has occurred, is about the only thing that can explain my problem, other than bad hardware (which I refuse to believe, since "it worked before"). While trodding down the mmap path after the backup-one failed to preterb Karl's bug or result in a "freeing free page" panic, I found the mmap backing object end-of-file problem. This actually doesn't help me; I am still hunting my normally-accessed-file corrupted by contents of mmaped-file-from-different-process bug, and I may still be looking for the "pages zeroed at random" problem. No one who has this problem has enabled DIAGNOSTIC with the new patch to see if the insert is stomping things, so I can't tell if John fixing the bogus-invalid-during-cleanup bug was all that was necessary for that. 8-(. Anyway, after all that, I am actually very happy to be using the -current list as something other than an overflow from -ports or -questions or -I-didn't-read-the-FAQ. So shoot me. 8-). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message