Date: Mon, 3 Jun 1996 16:49:13 +0300 (EET DST) From: "Andrew V. Stesin" <stesin@elvisti.kiev.ua> To: se@zpr.uni-koeln.de (Stefan Esser) Cc: hardware@freebsd.org, doc@freebsd.org Subject: Mystery has gone! Thanks! (How a non-obvious HW problem was solved) Message-ID: <199606031349.QAA09685@office.elvisti.kiev.ua> In-Reply-To: <199606022214.AA22260@Sisyphos> from "Stefan Esser" at Jun 3, 96 00:14:33 am
next in thread | previous in thread | raw e-mail | index | archive | help
Dear Stefan and FreeBSD people, it seems to me that I found a REAL solution to this. See below. [... a configuration I'm talking about: ...] # } A machine, our recently built firewall gateway to Internet, # } is: # } ATC-1425B mainboard, PCI, SiS 496/7 chipset; # } 16Mb RAM; # } AMD 5x133 CPU; # } NCR 53c810 SCSI; # } 1Gb Conner CFP1060S drive (recent, good one); # } two modems on the onboard COMs (SLIP lines to the world); # } 1 Ethernet card. # } # } OS: FreeBSD-stable as of late March. # } Add-ons: IPfilter 3.0.3+ (by Darren Reed) as in-kernel IP filtering # } facility, Squid 1.0beta7 WWW proxy cache daemon. # } # } The machine was experiencing spontaneous reboots from time to time. # } Either silent reboots, or prefaced with messages from NCR driver # } (like "NCR dead?"). [... kind explanations and suggestions mostly omitted ...] # The main difference is that the 21041 is a PCI bus-master. Yes, that's why I took it out -- my first guess was that this particular MB has some breakage in PCI implementation internals, which breaks busmastering PCI devices (isn't NCR a busmaster, too, btw?) Now I see I was wrong. # There have been other motherboards that did not work correctly # with multiple PCI bus masters, but I have no idea about the SiS # chip set being broken in such a way. SiS 496/7 -based MBs are "the line of choise" for 486 boards at our site. They're generally Ok -- not as fast as ASUS SP3G (I have some experience with those, too, but they dissapeared recently from stocks); they're stable and reliable. We have some older SiS boards from SOYO, and ATC-1425B -s are from different vendor (some Taiwanese, too) and they do support AMD 5x133. Have also seen ASUS with SiS 496/7 (SP3), too -- I didn't liked them (only 2 RAM sockets, were unstable under FreeBSD, though people claims that it was due to ancient BIOS firmware). As for multiple busmasters in SiS boards... We had a 4-ether router for a while, with: NCR, 2 'lnc' AMD PCI boards, and Realtek PCI NE2000 clone. All 4 PCI slots were full. Lance ethers are busmasters, supported by ISA driver (PCI NE2000 worked with ISA 'ed' driver). CPU was AMD dx2/80 This monster was reliable and fast, but it threw couples of messages about failed DMA on lnc[01] and "NCR dead?" occasionally under peak loads. But drivers performed hardware reset, and it worked for weeks this way. Being a cautious person, I redesigned network layout recently :) when Realtek PCI NE2000 card died :-))) My experience tells me that SiS 496/7 boards are Ok, reasonably "old" and stable, but they do not enjoy overloading of their slots with peripherials. If you'll fill all ISA and PCI slots -- be ready to get spontaneous crashes and hardware troubles. (Seen this on our UUCP mail host). Having at least one ISA and one PCI slot empty is Ok. # Some systems did not work reliably with all PCI performance # options enabled (e.g. PCI Burst Mode, Write Buffers, ...), and As I was told by hardware technical guys, these problems were pretty often half a year ago; recent revisions of BIOSes (Award, AMI) are improved and the problems (kinda of?) went away. # I have seen other reports where a high interrupt load made the # kernel fail with the PC pointing into the NCR driver. But I do # not think this necessarliy points out a driver problem, since Your'e 101% right. [...] # I've been using the NCR and a DEC 21040 based Znyx 312 for some # time in my ASUS SP3G system, and never had the kind of trouble # you see. Our "approved" kind of HW setup is: SiS496/7 based board, AMD 5x133 CPU, NCR 53c810, IBM SCSI drive(s), DEC 21040-based ether, any S3 868 video, other periph. to your taste, 16+ megs of RAM. Cheap, solid and productive; I highly recommend it. # If your system currently got any performance options enabled, I'd # just try without them. Wait states added to memory and cache accesses # and PCI setup to work without burst transfers should help find a # possible hardware performance problem. The final solution which I found: SIMMs weren't of appropriate quality!!! despite they were marked as 60ns!!!! WHAT A FSCK!!!! The DRAM chips on the SIMMs are Texas Instruments, detailed chip info available upon request (in case anyone interested). ATC-1425B has "Auto configuration" option in BIOS setup. "Huh, it should be a pretty safe kind of setup, if it puts ISA to 7.159MHz!" -- I thought initially :) It was turned "on". After all kinds of fighting with PCI setup options (performance degrade -- but still crashes!) that's what I did two days ago: 1. Turned "Auto config" in BIOS "off". 2. ISA BUS clock -- put to 33MHz/4 -- it's appropriate. 3. Added a _single_ (!) wait state to the BIOS timing which manages transfers between L2 cache (btw L2 cache is 15ns on ATC-1425B board) and main DRAM, just changed it from 2 to 3. (The machine is up now, if someone needs an exact spelling of how this BIOS option is called -- ask). And -- YESS!!! the problem dissapeared! (The machine stood up bravely under flood pings and TCP shoots from 3(!) other FreeBSD boxes, and with disk activity artificially inspired -- for 48 hours non-stop, previously just 5-10 minutes of stress killed it). The box is still up now, no more problems observed. (Probably I'll try to put Lance ether into it, just for experiment -- but I simply don't want to reboot it at all, it holds our Inet connection!) Thanks to all you friends who supported me! Please take my sincere apologies for taking your time! I hope my experience will be of some use for Hardware Compatibility Guide which is now in preparation, and people will benefit a bit from it. -- With best regards -- Andrew Stesin. +380 (44) 2760188 +380 (44) 2713457 +380 (44) 2713560 "You may delegate authority, but not responsibility." Frank's Management Rule #1.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199606031349.QAA09685>