From owner-freebsd-current Fri Aug 23 1:32: 5 2002 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E7D9B37B400 for ; Fri, 23 Aug 2002 01:32:02 -0700 (PDT) Received: from canning.wemm.org (canning.wemm.org [192.203.228.65]) by mx1.FreeBSD.org (Postfix) with ESMTP id A3F4A43E70 for ; Fri, 23 Aug 2002 01:32:02 -0700 (PDT) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by canning.wemm.org (Postfix) with ESMTP id 900A12A7D6 for ; Fri, 23 Aug 2002 01:32:02 -0700 (PDT) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: current@FreeBSD.ORG Subject: Re: Memory corruption in -CURRENT [was Re: Plea to committers to only commit to HEAD if you run -current {from developers@FreeBSD.org}] In-Reply-To: <20020823063155.GA215@HAL9000.homeunix.com> Date: Fri, 23 Aug 2002 01:32:02 -0700 From: Peter Wemm Message-Id: <20020823083202.900A12A7D6@canning.wemm.org> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG David Schultz wrote: > Thus spake Terry Lambert : > > David Schultz wrote: > > > Thus spake Terry Lambert : > > > > DISABLE_PSE is a 1:6 probability; DISABLE_PG_G is a 1:100 (both > > > > estimates, but on that order), so mixing and matching them will > > > > not usually give any additional information. Martin got "lucky" > > > > with his machine... it seems to require both. > > > > > > > > The problem is a hardware bug in most Pentium on up processors, > > > > which gets worse in newer CPUs (P4, AMD) as they try to optimize > > > > certain things. It's like writing ANSI C without "volatile". > > > > > > It sounds like you're describing a cache coherence problem. Could > > > you elaborate or point me to a reference on this? Thanks. > > > > There is no reference on this. It is an undocumented hardware bug. > > Err...so you know there's a long-standing random bug that it has > to do with 4 MB pages, but nobody has bothered to characterize it > after all these years? This sounds much like the problem Linux > had with 4 MB pages and AGP GART on Athlons, where the hardware > designers maintained that it was a `feature', not a bug, and that > the software people were relying on undocumented behavior. I know of one bug we were running into that basically boils down to 'do not point a 4MB page at physical address zero or funny things happen'. This particular one affects pentium pro and the older pentium 2 systems. I have finally fixed this particular problem and am testing it out now on two troublesome systems that I have available. I have a vague suspicion that it just *might* be a factor in the current round of problems on UP pentium4's. Terry claims to have diagnosed another bug but says he will not tell anybody what it is or how to work around it. There are also fundamental races in the pmap code when page tables are shared. We've fixed some of the bugs, but there are more still. :-( Cheers, -Peter -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message