From owner-aic7xxx Tue Sep 22 07:48:03 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id HAA15843 for aic7xxx-outgoing; Tue, 22 Sep 1998 07:48:03 -0700 (PDT) (envelope-from owner-aic7xxx@FreeBSD.ORG) Received: from dialnet.net (mail.dialnet.net [206.65.248.7]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id HAA15786 for ; Tue, 22 Sep 1998 07:47:59 -0700 (PDT) (envelope-from dledford@dialnet.net) Received: from dledford.dialnet.net (dledford.dialnet.net [206.65.249.116]) by dialnet.net (8.8.5/8.8.4) with SMTP id JAA32564; Tue, 22 Sep 1998 09:47:23 -0500 Date: Tue, 22 Sep 1998 09:47:21 -0500 (CDT) From: Doug Ledford To: "Robert G. Brown" cc: aic7xxx Mailing List Subject: Re: One more 2300 healthy (rats?) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-aic7xxx@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tue, 22 Sep 1998, Robert G. Brown wrote: > Aww, and I already started to use the machine (yes, Virginia, I do > actually do MC simulations on these boxes...when they work). Sigh. > I guess I do need to document that this recent "fix" via freebsd boot > survives powerdown. I'll try the following: > > a) Power down, etc. and reboot as described. My prediction is that it > will now work fine, because pre10 worked fine on systems that had never > been powered up before or that had been unplugged and cleared -- as > long as they were not already displaying this "hung" behavior. > > I really do think that something weird is going on because I see > differences in the boot-time behavior of nominally "identical" machines > -- something that shakes my belief in electronic determinism (a thing > that is none too strong anyway;-). Allowing that any notion of a > WinDell "conspiracy" is nonsense (it was intended as a tongue-in-cheek > joke in the first place, and now of course Dell is working actively with > the linux community) there still appears to be solid evidence that there > is a non-volatile location in the 7890 subsystem on these systems that > survives total powerdown, the placement of the NVRAM-clearing jumper, an > adaptec-bios reset (in the card bios itself) and the POST/initialization > process, whatever it might be. Those aren't the only possibilities. It's actually entirely possible that the bug could now lie outside the aic7xxx driver in some of the more generic linux kernel code that touches or effects this chipset. Possibilities include the generic PCI initialization code, chipset setup code, etc. It's possible that something in there could be causing these problems. > Here I'm at a disadvantage -- lacking > device specs I cannot speculate where such a location might be or how it > gets corrupted, but it does appear that it was corrupted in the Dells on > delivery and gets reset by WinNT and now freebsd on boot, but not by > pre10. I've got the docs and I can't find any location that would cause this :) Of course, one thing I don't think you've tried is forcing the pci parity checking off using the pci_parity boot option. That could possibly make a difference. > b) So, I'll also try to power down, etc. and reboot an earlier image, > maybe pre3 or the like, that installed but then messed up. By looking > at what a revision writes that "causes" the problem and what a revision > writes that leaves the problem alone, it may be possible to find a > location that was written to -- wrong -- that is now not written to at > all and that needs to be written to right. I have a better idea. > It shall be carefully preserved in its dysfunctional state, except that > I will let Dell replace the bad RAM. Hopefully it won't just suddenly > start to work when they do... I've made a few changes in preparation for pre11 (which isn't quite ready yet), but some of those changes also have to do with the aic7xxx=dump_card option. I'll (off the list) send you a copy of my current aic7xxx.c file. Use it to boot on a dysfunctional machine, write down the output from the dump_card option, then boot the freebsd floppy, then reboot into the linux code and run the dump card again. If it magically starts working but the dump_card stuff doesn't show a change, then it would *have* to be in more generic code outside the aic7xxx driver. Actually, you might have to do dump_card, then boot again to make sure it doesn't work without the dump_card since that used to cause the driver to hang but I think I have that fixed, I just haven't tested it yet. And after the FreeBSD boot disk you would also have to boot the linux disk twice, once with and once without the dump_card to test things (unless the second time around the dump_card boot goes ahead and comes up, then we would know I fixed that problem as well). -------------------------------------- Doug Ledford Opinions expressed are my own, but they should be everybody's. -------------------------------------- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-aic7xxx" in the body of the message