Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 Sep 1997 00:20:10 +0200
From:      Stefan Esser <se@FreeBSD.ORG>
To:        Walter Hafner <hafner@forwiss.tu-muenchen.de>
Cc:        freebsd-scsi@FreeBSD.ORG, freebsd-hardware@FreeBSD.ORG, Stefan Esser <se@FreeBSD.ORG>
Subject:   Re: Is my NCR controller broken?
Message-ID:  <19970924002010.15006@mi.uni-koeln.de>
In-Reply-To: <199709180857.IAA03695@pccog4.forwiss.tu-muenchen.de>; from Walter Hafner on Thu, Sep 18, 1997 at 08:57:34AM %2B0000
References:  <199709180857.IAA03695@pccog4.forwiss.tu-muenchen.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sep 18, Walter Hafner <hafner@forwiss.tu-muenchen.de> wrote:
> Hello!

Hallo!

Sorry for the late reply ...

> I just want to make sure I don't miss something before changing my
> mainboard. Please enlighten me.
> 
> I run a 486/DX2-66 (ASUS SP-3 with onboard NCR-810 SCSI
> controller). This computer runs for about 3 years now (2.0.5, 2.1.0,
> 2.1.5)

Is this the original ASUS SP3 with the Saturn I
(revision 2) chip set ?

That chip set is known buggy, and you'll have to
disable one of the PCI bus performance options.
I don't remember if it was "PCI bursts" or some
buffer option ("write buffers" ??)

> Since about four weeks I keep getting SCSI resets and then the bus is
> dead. No recovery! And it's really strange because the NCR controller
> reports totally different errors before hanging. Here are the error
> reports from the last three crashes (typed in by hand, so the actual
> format may differ):

Did you by chance do any of the following:

- modify PCI BIOS setup options (bursts, ...)
- add another PCI card (even a bus-master)
- add some ISA card
- change the amount of memory in the system

> -------------------------------------------------------------------------------
> 
> sd1(ncr0:1:0): internal error: cmd00 != 91=(vdsp[0] >> 24)
> ncr0: timeout ccb=f19fbc00 (skip)

This is a "can't happen" case, and the first 
time I see it reported. Some value in a register
is different from the data at the address from
where this register was loaded.

> -------------------------------------------------------------------------------
> 
> ncr0:1: ERROR (a0:0) (f-28-0) (8/13) @ (260:00000000).
>         script cmd=fc00001c.
>         reg:     da 10 80 13 47 08 01 1f 00 0f 81 28 80 00 00 00.
> ncr0: restart (fatal error).
> sd1(ncr0:1:0): command failed (9ff)@f19fbc00.
> nrc0: timeout ccb=f19fbc00 (skip)

Another indication of a hardware problem: The
NCR status has the bus fault bit set in DSTAT,
which indicates a problem accessing the PCI bus.

> -------------------------------------------------------------------------------
> 
> ncr0: SCSI phase error fixup: CCB already dequeued (0xf19fbc00)
> nrc0: timeout ccb=f19fbc00 (skip)

Hmmm, another "first" ...
There definitely is something wrong with your
hardware.

> I changed everything:
> 
> * disconnected everything except the system drive -> still errors
> * changed cables (three different ones) -> still errors
> * changed termination (two different external ones, internal, different
>   termpower sttings etc.) -> still errors
> * turned all devices to 5MB synchr. and finally to acync via
>   'ncrcontrol' -> still errors
> * finally replaced the system drive (old DEC 5200 against new IBM DAHC
>   34330) and put 2.2.1 on it -> still errors. Actually, the errors above
>   are from that setup.
> 
> The only thing I didn't change was the mainboard.

Well, and I think that's the problem :)
But please try with conservative PCI options.
This helped other people with an ASUS SP3, too.
I just don't remember the exact option that did
cause the problem. Just disable all that the 
BIOS setup offers :)

> I'd be glad if anyone can confirm my suspicion that the NCR controller
> has gone nuts. I just can't imagine why ...

No, I don't think this is a controller going bad.
Though such a thing has happened before ...

> I'd also appreciate it very much if someone with more insight than
> myself could explain the error reports to me. I'd especially like to
> know what this 'f19fbc00' means: it shows up in all three errors (what's
> a 'ccb' anyway?)

The CCB is a Command Control Block, a structure
that contains all the information the NCR needs
to issue and execute a SCSI command. It is in 
fact surprising, that the same address is printed 
in each case, but depending on the number of drives
and whether tags are enabled, it is possible that 
only one CCB is in use.

Regards, STefan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19970924002010.15006>