Date: Wed, 2 May 2001 18:27:38 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: Riccardo.Veraldi@fi.infn.it Cc: freebsd-alpha@FreeBSD.ORG Subject: Re: vector: 0x670 Message-ID: <200105021827.LAA29183@usr02.primenet.com> In-Reply-To: <Pine.NEB.4.33.0105021309380.9571-100000@nikita.fi.infn.it> from "Riccardo.Veraldi@fi.infn.it" at May 02, 2001 01:10:39 PM
next in thread | previous in thread | raw e-mail | index | archive | help
> 670 Processor Uncorrectable unrecoverable cache or TLB errors, or > read of a non-existent I/O space > > > Do you think it can be an error due to overheating or do I have to throw > my AlphaStation away ?? No quick answers, but some things to try, and some advice and opinions... A cache error means that the processor is bad, if it's L1 cache; this could be the result of overheating. If it's L2 cache, the cause could still be overheating, but I've seen a lot of people trying to use cache chips that were too slow. - My guess is that it's _not_ a cache error. - If it's overheating, that's usually the result of overclocking, either intentionally, or unintentionally. Make sure you are not doing that. You might also be using memory which is too slow, or has fake parity instead of real parity (don't do that). Slow memory tends to become more of an issue when you stick in a lot of it, since the DMA refresh doesn't get around to each bank in time; this is particularly problematic if you are doing heavy I/O, so that the memory bus is latched for DMA, and refresh is thus delayed really long due to bus hold times; this is usually possible to adjust in the drivers or controller configurations and is often called "bus on" time. - I've occasionally loaded a machine with too much memory for it to reasonably handle refresh, given the memory bus speed and the bus-on time for some PCI controllers (I had an Adaptec that was a bus hog; when I loaded the disk subsystem with the extra amount of RAM, the refresh failed, and the system lost its mind). A TLB error means that the contents of a Translation Lookaside Buffer are incorrect. This could be a kernel bug. - If you are running -current, this one would be my bet; in particular, if you are trying to use SMP, it's nearly a certainty. Alternately, you are using a bad driver for a board you stuck in yourself which works on the x86, but hasn't been tested on the Alpha, and it's accessing non-existant I/O space, or the card itself is bad, and not responding to being addressed. - If I had to bet, I'd say you stuck a sound card in the thing, and it's choking (sound card drivers are notoriously fragile, so that's why that's my guess. The easiest things to do, in order of increasing difficulty: o Make sure you are running cold, and see if the problem occurs while cold; if it doesn't, cool it back down (e.g. leave it off in a cool room for a good long while), and then after booting, immediately put it under a stress load to determine if it's load over time that you are seeing the failure from, or if it's increased heat over time (it may be load that triggers it, but it looks like time because aggregate load over time is high, but instantaneous load is not). o Make sure you are not overclocking the CPU. o Yank all unnecessary cards, and see if you can repeat the problem. o Yank all unnecessary drivers from your config, rebuild a kernel, and try to repeat the problem. o Yank out as much memory as you can, and still be able to boot; if the problem still recurrs, swap the memory still in there for the memory you hanked out, and try again. o Yank out the L2 cache; it will be slow, but it should still be able to work. o Try a different brand of disk controller. o Use the same disk controller, after sticking it in an x86 and resetting its CMOS settings to the factory defaults using the BIOS setup utility. o Replace the CPU, on the theory that it's shot, or the L1 cache is fried on the thing. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200105021827.LAA29183>