Date: Tue, 28 Jul 1998 15:24:35 -0400 (EDT) From: "Robert G. Brown" <rgb@phy.duke.edu> To: Mike Isely <isely@pobox.com> Cc: AIC7xxx@FreeBSD.ORG Subject: Re: Puzzle for Doug... Message-ID: <Pine.LNX.3.96.980728151333.1869C-100000@drupadi.phy.duke.edu> In-Reply-To: <Pine.BSI.3.95.980728124030.6005D-100000@nathan.enteract.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 28 Jul 1998, Mike Isely wrote: > Well since the aic7xxx hardware executes DMA on its own behalf, that sort > of memory access might look "different" enough to the hardware to expose a > latent race condition. Certainly there's more memory contention going on > with the aic7xxx stuff in the picture. Good point. I also am wondering if the high speed of the CPU's, the memory and the U2 controller itself combine to reveal a race condition. I just really believe that the race is in the driver. > Such memory tests never amount to more than a quickie existence check. > "Leaky" DRAM cells (if such a thing could happen) can't be picked up > for example because it would take many many microseconds for the bit(s) to > go bad. BIOS memory scans run way too fast for that. Again, if it were "raw" bad DRAM, the system simply wouldn't work regardless of the presence/absence of the aic7xxx driver. Something else would be using the critical memory during boot and fail. I like your DMA/race/contention hypothesis below much better. > > > > > The only way that I could see the problem being bad memory is if the > > SDRAM they put in the systems is somehow marginal and occasionally > > fails but ONLY IN A WAY THE AIC7XXX DRIVER TWEAKS! And only on the > > Without any DMA devices active in the system, the memory activity is going > to be limited to whatever the CPU causes. Is there any known-DMA going on > without the aic7xxx running? With multiple independant (fast) devices > initiating memory access, all sorts of contention issues can arise. Of > course, this is supposed to work, but without the aic7xxx stuff active you > might not be beating on it hard enough to cause the trouble. Remember the > RZ1000 IDE problem a few years back? Yeah, this occurred to me -- I have an eepro100 in the system and there is indeed network traffic, especially during diskless boots. It's harder to see this as a problem in NON-diskless boots, though. Also, the network device is formally probed and initialized only AFTER the scsi device. Finally, I unplugged the cable during a boot or two so that it wan't actually receiving packets during boot. No effect. Still, a definite possibility. > > Just fishing for ideas for ya. I think a game of musical hardware is > definitely the next step here. But even that may not give conclusive > results if something in Dell's configuration is "right on the edge". And I appreciate it! But *moan*... rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe aic7xxx" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.3.96.980728151333.1869C-100000>