Date: Sun, 20 Sep 1998 13:47:43 -0400 (EDT) From: "Robert G. Brown" <rgb@phy.duke.edu> To: Doug Ledford <dledford@dialnet.net> Cc: felix@halef.rhrz.uni-bonn.de, aic7xxx@FreeBSD.ORG Subject: Re: pre10 on a 2940u2w still shows BRKADRINT Message-ID: <Pine.LNX.3.96.980920133224.8938D-100000@ganesh.phy.duke.edu> In-Reply-To: <36033D4C.818CC987@dialnet.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 19 Sep 1998, Doug Ledford wrote: > Robert G. Brown wrote: > > > > > into a similar BRKADRINT bug. Whereas the pre5 would loop with > > > > Data-Path Ram Parity Error > > > > PCI Error Detected > > > > (scsi 0) SEQADDR = 0x50 > > > > the pre10 kernel would loop with > > > > Data-Path Ram Parity Error > > > > (scsi 0) SEQADDR = 0x17d [the first time] > > > > (scsi 0) SEQADDR = 0x1 [subsequent times] > > > ... > > > > The problem with a relatively new driver is that it is difficult to > > separate out problems with the driver, problems with the hardware (yes, > > some hardware IS just plain old defective), problems with the firmware, > > and problems with the attached system BIOS and conflicts with other > > devices on the PCI bus. > > In this case, I think this is a problem with the driver. However, just what > that problem is isn't known yet :) I'm currently leaning towards the idea > that I haven't *completely* written every memory location on those cards, > and if you want to be completely anal about the Adaptec docs, you are > suppossed to write something to every location on those cards to initialize > the parity bits. So my next driver version will be putting in code to > hopefully hit more of the locations and get things set up properly. You > never know, maybe this error is from a parity error in the data fifo. > Anyway, as soon as I can re-produce the problem here, then I'll get it > fixed. I'm working on that :) Could such a problem survive power down? What you describe certainly is consistent with my early experiences with a "working" pre7 driver, but only if I booted NT at least one time first (presumably it wrote everything and set all the parity bits?). However, I would have expected that a COMPLETE power down (where I pull the plug in back and hit the power button one more time to drain the capacitors in the attempted ATX startup surge) would have reset everything to ground zero... Your problem reproducing it may be statistical. Right now the odds of it occurring are first order estimate of 1/8, but by machine, not by bootup or powerdown. You might try booting your system a bunch of times with pre3-pre7 -- it is quite possible that something in pre3 or pre4 is "toxic" enough that it corrupts the right places in a way that survives between boots (somehow). Is there any chance that one of these drivers could have overwritten the Adaptec BIOS in any way or part (this may be a dumb question -- I don't really know how the BIOS flash works in the first place -- but if it is a matter of writing to the wrong part of the address space it seems possible which is why I ask). > > > cards, and disks). As of the linux 5.1.0pre10, I seem to be able to > > boot and run nearly all of them. > > Excellent :) I was hoping more of your machines would start working with > pre-10. All the ones that never booted or ran anything earlier than pre7 run fine. Hence my suggestion with pre3 or the like. Of course, even 16 machines isn't much of a statistical universe, especially with one known hardware problem (the bad memory bit), although the Dell BIOS seems stable against this one -- it just marks down the total available memory. This won't help linux in the long run since I have to tell it how much memory the system has by hand anyway, but it won't crash it immediately. > > > > If there IS a good suspicion that it is still a driver problem, I still > > have two good (that is, bad:-) systems that I can try any solutions on. > > Well, I still think it's a driver problem. Essentially, my stance on the > issue is that the driver should be able to work with the hardware regardless > of BIOS bugs since the only thing we really use the BIOS for are just a > *very* select few items (such as the proper state of STPWRLEV in the > DEVCONFIG register). These are machine/device dependant, so we can't init > those reliably (although pre-11 allows you to pass special params to the > driver to force these settings). Other than a few things like that, the > driver should be able to work regardless of any BIOS bugs. As always, if there is anything I can do to help things out, let me know. I could, for example, give you an account on our network and bring up one of the systems diskless (with no aic7xxx loaded). You could then work on a system that definitely has the problem whenever I was around to go down and hard-boot it when it got stuck. Booted diskless with RO root, you literally couldn't hurt anything I couldn't put back in two minutes. If you want to try this, send me a passwd line and I'll set it up tomorrow or Tuesday. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-aic7xxx" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.3.96.980920133224.8938D-100000>