Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 Apr 1996 10:51:15 +0930 (CST)
From:      Michael Smith <msmith@atrad.adelaide.edu.au>
To:        scrappy@ki.net (Marc G. Fournier)
Cc:        current@FreeBSD.org, hackers@FreeBSD.org
Subject:   Re: Intelligent Debugging Tools...
Message-ID:  <199604240121.KAA13482@genesis.atrad.adelaide.edu.au>
In-Reply-To: <Pine.NEB.3.93.960423154553.23204B-100000@freebsd.ki.net> from "Marc G. Fournier" at Apr 23, 96 04:02:04 pm

next in thread | previous in thread | raw e-mail | index | archive | help
Marc G. Fournier stands accused of saying:
> 
> 	What would it take to either create software for debugging
> hardware, and/or add appropriate debugging to the kernel that would
> improve debugging of hardware problems?

Ah.  As someone with a foot in both the hardware and software camps,
all I can say is "forget it".  Any software has to make a few
assumptions about the hardware it runs on.  If the hardware fails to
meet those assumptions, (eg. random parts of memory change) there's no
hope for the software.

To answer your question absolutely, this sort of software does exist.
You find it in board-level test equipment with price tags starting in
the mid six figures.  Configuring such software usually requires
access to the manufacturer's specification for the DUT.  (If such
information actually exists in the first place - often it's easier for
a board vendor to just throw a prototype together, and if it runs
Windows, commit to manufacturing it.)

> 	Erk...as far as software is concerned, maybe something that
> you could run in single user mode that would completely thrash the
> RAM, doing read/writes to *all* the memory looking for any corruption?

"make world".  The issue here is that it's not _just_ memory, but the
interaction between processor memory accesses, busmastering activity,
refresh, chipset timing and random system noise.

Simulating such an environment is _impossible_.  If the memory was
legitimately altered in an incorrect fasion (eg. a bus latch was late
and caught data from the master as it transited out of a valid state,
and subsequently wrote it into memory), even ECC memory won't help
you.

> 	As far as the kernel is concerned, I'm getting panics in VM
> and keep getting told its hardware problems...fine, but there *has*
> to be a better way of isolating the problem then replacing bits and
> pieces until the problem seems to go away.  For instance, when I get
> a VM fault...what exactly *is* the problem?  Is it a problem with 
> the swap space (ie. hard drives) or RAM?

Find a spare $10K or so and buy a _good_ DRAM tester.  Discover, much
to your surprise, that most of the DRAMs on the market fail to operate
to spec.  Become Enlightened.  Purchase a pile of Triton-II
motherboards, fork out _lots_ of money for fast ECC memory, and
_maybe_ your problems will go away.

What is worth bearing in mind is that other people are doing
essentially the same things that you are doing, but aren't having the
problems you are.  They don't have access to any magical software
fixes, it's just that their (our) hardware appears to work OK.

> 	Does this make any sense?

Yes.  The problem is that PCs are built like toasters, and making a 
souffle' in a toaster is very difficult.

> Marc G. Fournier                                  scrappy@ki.net

-- 
]] Mike Smith, Software Engineer        msmith@atrad.adelaide.edu.au    [[
]] Genesis Software                     genesis@atrad.adelaide.edu.au   [[
]] High-speed data acquisition and      (GSM mobile) 0411-222-496       [[
]] realtime instrument control          (ph/fax)  +61-8-267-3039        [[
]] Collector of old Unix hardware.      "Where are your PEZ?" The Tick  [[



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199604240121.KAA13482>