Date: Tue, 21 Jul 1998 19:38:30 -0500 From: Doug Ledford <dledford@dialnet.net> To: "Robert G. Brown" <rgb@phy.duke.edu> Cc: aic7xxx Mailing List <AIC7xxx@FreeBSD.ORG> Subject: Re: 5.1.0pre4 work du jour Message-ID: <35B53486.50263760@dialnet.net> References: <Pine.LNX.3.96.980721182433.9017T-100000@ganesh.phy.duke.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
Robert G. Brown wrote: > > Dear Doug et. al.; > > I did work some on the remaining problem today, but didn't get very > far. Justin Gibbs sent me a copy of the current freebsd sources > (thanks Justin) but going through the two versions didn't get me very > far as there is significant drift between the two. There is a lot of difference. That's why these ports never happen in a week :) > I did find what I > suspect is another bug; in the restart sequencer routine, I think that > writing the sequencer address needs to be prefaced by FAILDIS -- I did > notice another Data Parity Error occurring that might have been > ameliorated by the following but didn't take time to be sure. OK..before we get too far down this line, I want to set a few things straight. Why Justin is letting the sequencer run in FAILDIS|PERRORDIS mode I'm not sure, but in my source code the sequencer doesn't do that. Now, it's true that the FAILDIS is needed prior to setting the sequencer address back to 0, but in my code now, it reads as this (in the loadseq function that originally started all this): aic_outb(p, FASTMODE|FAILDIS|PERRORDIS, SEQCTL); aic_outb(p, 0, SEQADDR1); aic_outb(p, 0, SEQADDR0); aic_outb(p, FASTMODE, SEQCTL); Notice that as soon as the sequencer gets back to being in a valid address range, we re-enable the error detection logic on the chip. IMNSHO, to leave it disabled would be like having ECC DRAM and telling the motherboard to ignore it because you are getting parity errors. Is there an actual reason for this code being that way in your driver Justin? Something I'm not aware of? The answer here isn't to disable the error checking, but to find the problem, which is what I've spent all day looking for. As far as the 2742/2842 class controllers are concerned, I found one thing that would have definitely lead to problems with the chip and caused errors like these, and that's now fixed. I'm now able to reproduce the SELTO bugs that some people have seen and once I get that corrected, then I'll release a pre-5 and see how it goes. > I'm not > sure why the sequencer address has to be set here -- it is different > in Justin's code. Similarly, the linux version leaves > restart_sequencer with the sequencer paused, in the freebsd version > unpaused. Anyway, I added this and it made things no worse: > > static inline void > restart_sequencer(struct aic7xxx_host *p) > { > /* Set the sequencer address to 0. */ > + aic_outb(p, FAILDIS | FASTMODE, SEQCTL); > aic_outb(p, 0, SEQADDR0); > aic_outb(p, 0, SEQADDR1); Right, it made things no worse, but it's not what we want either. See above about that issue. In my current code the restart sequencer code is no more than setting the 2 addresses. It never touches SEQCTL at all any more. You're correct that my restart_sequencer doesn't unpause automatically. Any place that I want to restart the sequencer, I control if I want an immediate unpause, or if I want to wait. In some cases, the restart_sequencer() call is convenient at a location where I don't want things unpaused. > Beyond that I tried a bunch of things, all to no avail. It looks (to > my untrained eye) like the controller is coming out of the setup phase > in a paused state; what I imagine to be the first commands sent to the > controller by the scsi layer don't seem to be executing and time out. We specifically unpause the controller before we get that first command. If the controller goes back into a paused state after that first command, then it would likely mean there is a problem somewhere that's causing us to make the controller itself block. > However, I haven't yet traced execution up into the scsi layer, and so > far I bomb out with a timeout (or loop of timeout/resets) before the > controller returns its attached devices to the driver. > > Hopefully Doug will be back and knows right where the problem lies; I > will try again tomorrow, work permitting. I wasn't gone, I was here all day :) I just happened to be working in a slightly different direction based on what you've told me and what I can reproduce here. In short, I've got a machine here that's broken with 5.1.0pre4+. As long as that's true, then I know I don't have all of the problems fixed. Once I get this machine working, then I'll have a pre5 that should help people out. -- Doug Ledford <dledford@dialnet.net> Opinions expressed are my own, but they should be everybody's. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe aic7xxx" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?35B53486.50263760>