Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Oct 2003 16:38:19 +0100 (BST)
From:      Jan Grant <Jan.Grant@bristol.ac.uk>
To:        stable@freebsd.org
Subject:   Expert input required: P4 odd signals, no apparent memory fault,  DISABLE_PSE?
Message-ID:  <Pine.GSO.4.58.0310201625170.1903@mail.ilrt.bris.ac.uk>

next in thread | raw e-mail | index | archive | help
I'm tracking -STABLE on a 1.8GHz P4 with 512MB of memory. Roughly since
the PAE changes were MFCed, I've been seeing memory-corruption-related
errors under specific circumstances: for example, a run of
	portsdb -fUu
can be guaranteed to generate SIGBUS, SIGILL and SIGSEGVs in a handful
of sh, sed, etc. processes.

However, reverting to a 4.8 kernel from prior to September either
hides/masks these errors, or no longer triggers them. The memory/mobo
_appears_ to check out OK under (ferinstance) extended runs of
memtest86.

Now, on -current I've seen reference to the DISABLE_PSE kernel option,
and some discussion that this behaviour may be due to a processor/timing
bug. So I have the following questions which I'd appreciate an expert
giving a definitive opinion on (I'm no x86/hardware hacker, me):

- are these problems potentially caused by this bug?
- what exactly does DISABLE_PSE do? (it's undocumented and a one-para
  explanation of the expected behaviour of this option would be
  appreciated)
- were any commits around the time of the MFC of the PAE code liable to
  have introduced problems into the kernel which this workaround might
  address?

I know it's a lot to ask, but both hardware and OS have been rock-solid
up until this point. Although I've not conclusively ruled out hardware
faults, the continued stability under high load of a pre-september 4.8
kernel makes me suspicious that this is more likely to be a bug getting
tickled than I'd normally suspect from these symptoms.

I'm about to experiment with this option but it currently feels a little
like cargo-cult admin. If there are any definitive tests that would
indicate if this hardware problem is present and addressed by this,
that's be nice to know too.

Cheers,
jan

-- 
jan grant, ILRT, University of Bristol. http://www.ilrt.bris.ac.uk/
Tel +44(0)117 9287088 Fax +44 (0)117 9287112 http://ioctl.org/jan/
"No generalised law is without exception." A self-demonstrating axiom.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.58.0310201625170.1903>