Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 2 May 2001 18:27:38 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        Riccardo.Veraldi@fi.infn.it
Cc:        freebsd-alpha@FreeBSD.ORG
Subject:   Re: vector: 0x670
Message-ID:  <200105021827.LAA29183@usr02.primenet.com>
In-Reply-To: <Pine.NEB.4.33.0105021309380.9571-100000@nikita.fi.infn.it> from "Riccardo.Veraldi@fi.infn.it" at May 02, 2001 01:10:39 PM

next in thread | previous in thread | raw e-mail | index | archive | help
> 670     Processor Uncorrectable   unrecoverable cache or TLB errors, or
>                                   read of a non-existent I/O space
> 
> 
> Do you think it can be an error due to overheating or do I have to throw
> my AlphaStation away ??

No quick answers, but some things to try, and some advice and
opinions...


A cache error means that the processor is bad, if it's L1 cache;
this could be the result of overheating.  If it's L2 cache, the
cause could still be overheating, but I've seen a lot of people
trying to use cache chips that were too slow.

- My guess is that it's _not_ a cache error.

- If it's overheating, that's usually the result of overclocking,
either intentionally, or unintentionally.  Make sure you are not
doing that.

You might also be using memory which is too slow, or has fake
parity instead of real parity (don't do that).  Slow memory
tends to become more of an issue when you stick in a lot of it,
since the DMA refresh doesn't get around to each bank in time;
this is particularly problematic if you are doing heavy I/O,
so that the memory bus is latched for DMA, and refresh is thus
delayed really long due to bus hold times; this is usually
possible to adjust in the drivers or controller configurations
and is often called "bus on" time.

- I've occasionally loaded a machine with too much memory for
it to reasonably handle refresh, given the memory bus speed
and the bus-on time for some PCI controllers (I had an Adaptec
that was a bus hog; when I loaded the disk subsystem with the
extra amount of RAM, the refresh failed, and the system lost
its mind).

A TLB error means that the contents of a Translation Lookaside
Buffer are incorrect.  This could be a kernel bug.

- If you are running -current, this one would be my bet; in
particular, if you are trying to use SMP, it's nearly a certainty.

Alternately, you are using a bad driver for a board you stuck
in yourself which works on the x86, but hasn't been tested on
the Alpha, and it's accessing non-existant I/O space, or the
card itself is bad, and not responding to being addressed.

- If I had to bet, I'd say you stuck a sound card in the thing,
and it's choking (sound card drivers are notoriously fragile,
so that's why that's my guess.

The easiest things to do, in order of increasing difficulty:

o	Make sure you are running cold, and see if the problem
	occurs while cold; if it doesn't, cool it back down
	(e.g. leave it off in a cool room for a good long
	while), and then after booting, immediately put it
	under a stress load to determine if it's load over
	time that you are seeing the failure from, or if it's
	increased heat over time (it may be load that triggers
	it, but it looks like time because aggregate load over
	time is high, but instantaneous load is not).

o	Make sure you are not overclocking the CPU.

o	Yank all unnecessary cards, and see if you can repeat
	the problem.

o	Yank all unnecessary drivers from your config, rebuild
	a kernel, and try to repeat the problem.

o	Yank out as much memory as you can, and still be able
	to boot; if the problem still recurrs, swap the memory
	still in there for the memory you hanked out, and try
	again.

o	Yank out the L2 cache; it will be slow, but it should
	still be able to work.

o	Try a different brand of disk controller.

o	Use the same disk controller, after sticking it in an
	x86 and resetting its CMOS settings to the factory
	defaults using the BIOS setup utility.

o	Replace the CPU, on the theory that it's shot, or the L1
	cache is fried on the thing.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200105021827.LAA29183>