Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 21 Jul 1998 19:38:30 -0500
From:      Doug Ledford <dledford@dialnet.net>
To:        "Robert G. Brown" <rgb@phy.duke.edu>
Cc:        aic7xxx Mailing List <AIC7xxx@FreeBSD.ORG>
Subject:   Re: 5.1.0pre4 work du jour
Message-ID:  <35B53486.50263760@dialnet.net>
References:  <Pine.LNX.3.96.980721182433.9017T-100000@ganesh.phy.duke.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
Robert G. Brown wrote:
> 
> Dear Doug et. al.;
> 
> I did work some on the remaining problem today, but didn't get very
> far.  Justin Gibbs sent me a copy of the current freebsd sources
> (thanks Justin) but going through the two versions didn't get me very
> far as there is significant drift between the two.

There is a lot of difference.  That's why these ports never happen in a week
:)

>  I did find what I
> suspect is another bug; in the restart sequencer routine, I think that
> writing the sequencer address needs to be prefaced by FAILDIS -- I did
> notice another Data Parity Error occurring that might have been
> ameliorated by the following but didn't take time to be sure.

OK..before we get too far down this line, I want to set a few things
straight.  Why Justin is letting the sequencer run in FAILDIS|PERRORDIS mode
I'm not sure, but in my source code the sequencer doesn't do that.  Now,
it's true that the FAILDIS is needed prior to setting the sequencer address
back to 0, but in my code now, it reads as this (in the loadseq function
that originally started all this):

  aic_outb(p, FASTMODE|FAILDIS|PERRORDIS, SEQCTL);
  aic_outb(p, 0, SEQADDR1);
  aic_outb(p, 0, SEQADDR0);
  aic_outb(p, FASTMODE, SEQCTL);

Notice that as soon as the sequencer gets back to being in a valid address
range, we re-enable the error detection logic on the chip.  IMNSHO, to leave
it disabled would be like having ECC DRAM and telling the motherboard to
ignore it because you are getting parity errors.  Is there an actual reason
for this code being that way in your driver Justin?  Something I'm not aware
of?  The answer here isn't to disable the error checking, but to find the
problem, which is what I've spent all day looking for.  As far as the
2742/2842 class controllers are concerned, I found one thing that would have
definitely lead to problems with the chip and caused errors like these, and
that's now fixed.  I'm now able to reproduce the SELTO bugs that some people
have seen and once I get that corrected, then I'll release a pre-5 and see
how it goes.

>  I'm not
> sure why the sequencer address has to be set here -- it is different
> in Justin's code.  Similarly, the linux version leaves
> restart_sequencer with the sequencer paused, in the freebsd version
> unpaused.  Anyway, I added this and it made things no worse:
> 
> static inline void
> restart_sequencer(struct aic7xxx_host *p)
> {
>   /* Set the sequencer address to 0. */
> +  aic_outb(p, FAILDIS | FASTMODE, SEQCTL);
>   aic_outb(p, 0, SEQADDR0);
>   aic_outb(p, 0, SEQADDR1);

Right, it made things no worse, but it's not what we want either.  See above
about that issue.  In my current code the restart sequencer code is no more
than setting the 2 addresses.  It never touches SEQCTL at all any more.

You're correct that my restart_sequencer doesn't unpause automatically.  Any
place that I want to restart the sequencer, I control if I want an immediate
unpause, or if I want to wait.  In some cases, the restart_sequencer() call
is convenient at a location where I don't want things unpaused.

> Beyond that I tried a bunch of things, all to no avail.  It looks (to
> my untrained eye) like the controller is coming out of the setup phase
> in a paused state; what I imagine to be the first commands sent to the
> controller by the scsi layer don't seem to be executing and time out.

We specifically unpause the controller before we get that first command.  If
the controller goes back into a paused state after that first command, then
it would likely mean there is a problem somewhere that's causing us to make
the controller itself block.

> However, I haven't yet traced execution up into the scsi layer, and so
> far I bomb out with a timeout (or loop of timeout/resets) before the
> controller returns its attached devices to the driver.
> 
> Hopefully Doug will be back and knows right where the problem lies; I
> will try again tomorrow, work permitting.

I wasn't gone, I was here all day :)  I just happened to be working in a
slightly different direction based on what you've told me and what I can
reproduce here.  In short, I've got a machine here that's broken with
5.1.0pre4+.  As long as that's true, then I know I don't have all of the
problems fixed.  Once I get this machine working, then I'll have a pre5 that
should help people out.

-- 

 Doug Ledford  <dledford@dialnet.net>
  Opinions expressed are my own, but
     they should be everybody's.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?35B53486.50263760>