From owner-freebsd-current Thu Apr 25 12:58:36 1996 Return-Path: owner-current Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id MAA05184 for current-outgoing; Thu, 25 Apr 1996 12:58:36 -0700 (PDT) Received: from rocky.sri.MT.net (rocky.sri.MT.net [204.182.243.10]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id MAA05179 for ; Thu, 25 Apr 1996 12:58:33 -0700 (PDT) Received: (from nate@localhost) by rocky.sri.MT.net (8.6.12/8.6.12) id NAA19541; Thu, 25 Apr 1996 13:58:20 -0600 Date: Thu, 25 Apr 1996 13:58:20 -0600 From: Nate Williams Message-Id: <199604251958.NAA19541@rocky.sri.MT.net> To: "Marc G. Fournier" Cc: Nate Williams , current@FreeBSD.org Subject: Re: MotherBoard Jumper Settings... In-Reply-To: References: <199604251926.NAA19429@rocky.sri.MT.net> Sender: owner-current@FreeBSD.org X-Loop: FreeBSD.org Precedence: bulk > > Does that mean your box is now stable? If so, that's *great* news. > > > Nope, just that I'm going to submit new ones now that I > think I've gone over everything with a fine toothed comb and caught > any hardware mis-configurations I can find :) OK, here's some advice. Generally speaking most folks *shouldn't* have to go through this many steps, but in Marc's case where he's having problems that no-one else is seeing, this might be helpful. Step 0: - Remove *ALL* NFS and DOS mounts from your system. The NFS and DOS filesystems are slightly broken, and can cause weird problems. Step 1: - Disable *ALL* caches on your machine in the BIOS. Set the memory wait states to the higher number and your bus speed to ~8Mhz (for ISA/EISA boxes). Test, test, test, test, test. Do the errors still occur? If so, move onto step 2, else assume it's a hardware problem, probably involving the L2 cache (motherboard and/or memory) or BIOS setup. [ Leave the cache's disabled, just in case they are *also* a problem ] Step 2: - Make *SURE* (!!!!) that your SCSI cables are good and everything is terminated correctly. This means that there should be 2 termination points, one at one end and one at the other. Also, if you have external devices, remove them and terminate your SCSI card, just to rule out bad external SCSI cables (very common). If you've got a scanner, remove it. (Scanner's are notorious for screwing things up under load.) Test, test, test, test, test. Do the problems still occur? If so, move onto step 3, else assume it's a hardware problem with SCSI termination and/or cabling. Step 3: - Remove *ALL* non-essential hardware from the system. This means leaving a disk big enough for the OS and some sources, and necessary cards. Ultimately, this would mean only have a video card, hard/floppy card, and possibly an ethernet card. Test, test, test, test, test. The problems still occur? Then it's still possible that it's hardware, move onto step 4, else assume it's a misconfigured card. Step 4: - Swap out your memory with known-good memory, your disk with a known- good disk, and your controller with a known-good controller. (Heck, go IDE at this point.) Re-install FreeBSD to make sure all the bits aren't corrupted from a previously bad hardware setup. Test, test, test, test, test. It *should* work now, because it was a hardware problem in the first place, given the consistency and frequency of your problems. Quick history note: The original 'interim' (pre-FreeBSD, pre-WC) development was a 486/33 box that hosted the development when I was a student at Montana State University. This box (which is still in service today as my home box) would occasionally get NMI's from faulty hardware under heavy load. Most of the time it worked, but it was annoying. Almost 3 years after I got the box I finally got tired of it, and decided to replace the motherboard. Unfortunately, the board I got was DOA, but I noticed that the new board had faster cache ram than my original motherboard. On a whim, I swapped out the cache on my old (working but NMI) Mboard with the cache from the DOA board, and it seemed to work. From that day one, I have been unable to produce an NMI on that board no matter *what* the load. The machine has been 'rock' stable ever since I re-installed FreeBSD on it. However, before I installed FreeBSD on it I got random crashes b/c of FS corruption. Binaries, directories, inodes, and all sorts of other files were corrupt from the the previous hardware misconfiguration. So, even after I fixed my hardware problems, I still got *random* crashes. I backed up what I could of the data (using tar to avoid FS corruptions), and then re-installed and restored all my previos files and I haven't had a crash on it yet. The only reboots occur when I turn-on my DAT drive to do backups, and then reboot to turn it off since I don't like to leave it on. Nate