Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 25 Apr 1996 16:29:14 -0400 (EDT)
From:      "Marc G. Fournier" <scrappy@ki.net>
To:        Nate Williams <nate@sri.MT.net>
Cc:        current@FreeBSD.org
Subject:   Re: MotherBoard Jumper Settings... 
Message-ID:  <Pine.NEB.3.93.960425162005.2486D-100000@freebsd.ki.net>
In-Reply-To: <199604251958.NAA19541@rocky.sri.MT.net>

next in thread | previous in thread | raw e-mail | index | archive | help

the following advice given by Nate will be saved and worked through
once I get a machine I can do this on :(  Right now, my -stable(?) 
machine is my main server, and having it panic and be down for a few
minutes is possible, while I reboot it...but disabling the NFS
mounts is something that I can't do on this machine ;(

I'm trying to get a machine in that I can put stable onto and then
load test that.

As far as NFS mounts are concerned...I'm running 3 FreeBSD boxes
right now, that are all sharing /usr/httpd from one machine so that
I can balance the hits going to each machine instead of it all relying
on one...if NFS mounts are so broken that I can't do that, then I'm
using the wrong OS (and note...I don't believe I am...) for a production
environment (nothing from you, Cat!)

Hopefully over the next week or so, I'll be able to take delivery of
that machine and can give it an appropriate pounding, which, if it passes,
I can move stuff off of the main server onto it and take the main server
offline, and pound that until the hardware problem is exorcised from the
machine.

Until I do take delivery of the new machine, I shall not report any more
problems, since each one I report seems to be hardware related, and is 
generally wasting everyones time :(


On Thu, 25 Apr 1996, Nate Williams wrote:

> > > Does that mean your box is now stable?  If so, that's *great* news.
> > >
> > 	Nope, just that I'm going to submit new ones now that I
> > think I've gone over everything with a fine toothed comb and caught
> > any hardware mis-configurations I can find :)
> 
> OK, here's some advice.  Generally speaking most folks *shouldn't* have
> to go through this many steps, but in Marc's case where he's having
> problems that no-one else is seeing, this might be helpful.
> 
> Step 0:
> - Remove *ALL* NFS and DOS mounts from your system.  The NFS and DOS
>   filesystems are slightly broken, and can cause weird problems.
> 
> Step 1: - Disable *ALL* caches on your machine in the BIOS.  Set the
>   memory wait states to the higher number and your bus speed to ~8Mhz (for
>   ISA/EISA boxes).
> 
> Test, test, test, test, test.
> 
> Do the errors still occur?  If so, move onto step 2, else assume it's a
> hardware problem, probably involving the L2 cache (motherboard and/or
> memory) or BIOS setup.
> 
> [ Leave the cache's disabled, just in case they are *also* a problem ]
> 
> Step 2:
> - Make *SURE* (!!!!) that your SCSI cables are good and everything is
>   terminated correctly.  This means that there should be 2 termination
>   points, one at one end and one at the other.  Also, if you have
>   external devices, remove them and terminate your SCSI card, just to
>   rule out bad external SCSI cables (very common).  If you've got a
>   scanner, remove it.  (Scanner's are notorious for screwing things up
>   under load.)
> 
> Test, test, test, test, test.
> 
> Do the problems still occur?  If so, move onto step 3, else assume it's
> a hardware problem with SCSI termination and/or cabling.
> 
> Step 3:
> - Remove *ALL* non-essential hardware from the system.  This means
> leaving a disk big enough for the OS and some sources, and necessary
> cards.  Ultimately, this would mean only have a video card, hard/floppy
> card, and possibly an ethernet card.
> 
> Test, test, test, test, test.
> 
> The problems still occur?  Then it's still possible that it's
> hardware, move onto step 4, else assume it's a misconfigured card.
> 
> Step 4:
> - Swap out your memory with known-good memory, your disk with a known-
>   good disk, and your controller with a known-good controller.  (Heck,
>   go IDE at this point.)  Re-install FreeBSD to make sure all the bits
>   aren't corrupted from a previously bad hardware setup.
> 
> Test, test, test, test, test.
> 
> It *should* work now, because it was a hardware problem in the first
> place, given the consistency and frequency of your problems.
> 
> Quick history note:
> 
> The original 'interim' (pre-FreeBSD, pre-WC) development was a 486/33
> box that hosted the development when I was a student at Montana State
> University.  This box (which is still in service today as my home box)
> would occasionally get NMI's from faulty hardware under heavy load.
> Most of the time it worked, but it was annoying.
> 
> Almost 3 years after I got the box I finally got tired of it, and
> decided to replace the motherboard.  Unfortunately, the board I got was
> DOA, but I noticed that the new board had faster cache ram than my
> original motherboard.  On a whim, I swapped out the cache on my old
> (working but NMI) Mboard with the cache from the DOA board, and it
> seemed to work.  From that day one, I have been unable to produce an NMI
> on that board no matter *what* the load.  The machine has been 'rock'
> stable ever since I re-installed FreeBSD on it.
> 
> However, before I installed FreeBSD on it I got random crashes b/c of FS
> corruption.  Binaries, directories, inodes, and all sorts of other files
> were corrupt from the the previous hardware misconfiguration.  So, even
> after I fixed my hardware problems, I still got *random* crashes.  I
> backed up what I could of the data (using tar to avoid FS corruptions),
> and then re-installed and restored all my previos files and I haven't
> had a crash on it yet.  The only reboots occur when I turn-on my DAT
> drive to do backups, and then reboot to turn it off since I don't like
> to leave it on.
> 
> 
> 
> 
> Nate
> 

Marc G. Fournier                                  scrappy@ki.net
Systems Administrator @ ki.net               scrappy@freebsd.org




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.93.960425162005.2486D-100000>