Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 Jul 1998 16:00:31 -0500
From:      Doug Ledford <dledford@dialnet.net>
To:        "Robert G. Brown" <rgb@phy.duke.edu>
Cc:        aic7xxx Mailing List <AIC7xxx@FreeBSD.ORG>
Subject:   Re: Precise point of bomb...
Message-ID:  <35C0DEEF.FA7037C1@dialnet.net>
References:  <Pine.LNX.3.96.980730112327.19553D-100000@ganesh.phy.duke.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
Robert G. Brown wrote:

> I had already deduced this (if down(&sem) were "bad", the kernel
> wouldn't work at all).  I do think that a hint may be that these
> systems have two very fast scsi controllers on one IRQ, as nobody with
> a single 7890 seems to be reporting this kind of error.  Could the
> interrupt that comes in during the down(&sem) be a re-entrancy problem
> like I just mentioned in the reply to Drs. Isley and Pirih?

Doubtful.  I test this drive on a fast machine with 4 fast SCSI controllers
(Dual PII-266 with 7895, 7890, and 7856 controllers).  At various points in
the testing, due to the IO-APIC stuff in the 2.1.x kernels, I've had all
four SCSI controllers on the same interrupt and never had re-entrancy
problems.

> Nothing matters for me today and tomorrow but to get these systems
> going.  I know that I'm probably just generating a lot of noise on the
> list, but I really am hoping that something I find with my printk's
> will help you isolate at least the precise circumstances under which
> the problem occurs, which may yield a clue as to where and why the
> sequencer code is being munged.

Well, here's one thing.  On my 2940U2W here at the house, I was wondering
why it would boot with the BIOS disabled, but not with the BIOS enabled.  I
noticed that every time it tried to perform an async transfer and hung, that
SPIOEN bit was *not* set in the SXFRCTL0 register.  I then added some code
to the sequencer to make *sure* that bit was set.  I then recompiled the
sequencer, recompiled the kernel, and rebooted.  The next time through,
instead of having the SPIOEN bit set in SXFRCTL0, it had set the SCAMEN
bit.  That's what makes me think there is an off by one error somewhere.  At
this point, I think the off by one might be in the generation of certain
constants.  IOW, there are certain bits out of the 32 bit instruction that
are reserved as the bits for an immediate constant.  I think those bits
might be off by one.  That would explain the SCAMEN bit being set in
SXFRCTL0 since the SCAMEN bit is one off from the SPIOEN bit.

> > 1) Try disabling MMAPed I/O on your machines.  Somewhere around line 391 is
> > where we enable it for i386 architecture.
> 
> Just comment out #define MMAPIO, or do I need to comment out the
> definition of mb() as well?

Just the MMAPIO define is all that needs changed.

> Sure, this is fine.  A very useful idea.  Either way, the information
> that the sequencer code is NOT being munged would be just as useful as
> the specific points where it is, and I might be able to alter the
> download code to download AND CHECK to be sure that the download
> completed correctly and repeat until it succeeds.  Self-checking
> surely beats hand checking.  Still, remember that I have two identical
> systems literally side by side and booting the same kernel from an
> identical boot floppy and loading the same module -- one works fine
> and the other fails.  Whereever the code is being munged is not
> deterministic from the code point of view -- it has to be somewhere
> where an assumption is made concerning latency or protection that is
> no longer correct for this particular hardware/interrupt
> configuration.  I don't expect this to just be a code typo, like the
> FAILDIS thing.

> I have Justin's code, and actually have already checked
> aic7xxx_loadseq (which appears ok).  I haven't looked at
> aic7xxx_check_patch or download_instr yet, but will soon.
> 
> So please send the listing and patch code (tarball attachment or
> uuencode or just a URL to the aic7xxx site are all OK -- I have a
> window open on the aic7xxx site nearly all the time now anyway).

OK..I haven't got it up there yet, but I will in a few moments. 
Specifically, I'll be uploading a tarball of the current sequencer assembler
and what not that I use.  In that directory that it unpacks to, you can
modify the Makefile to suit your own personal path needs, then run a make
list and it will generate a code listing for you.  I'll also include a patch
to the aic7xxx.c file in this directory that will cause the code actually
downloaded to the card to get printed out.  Expect that code within about 30
minutes.

-- 

 Doug Ledford  <dledford@dialnet.net>
  Opinions expressed are my own, but
     they should be everybody's.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?35C0DEEF.FA7037C1>