From owner-freebsd-scsi  Wed Oct  8 14:13:43 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.7/8.8.7) id OAA14836
          for freebsd-scsi-outgoing; Wed, 8 Oct 1997 14:13:43 -0700 (PDT)
          (envelope-from owner-freebsd-scsi)
Received: from Octopussy.MI.Uni-Koeln.DE (Octopussy.MI.Uni-Koeln.DE [134.95.166.20])
          by hub.freebsd.org (8.8.7/8.8.7) with SMTP id OAA14829
          for <freebsd-scsi@freebsd.org>; Wed, 8 Oct 1997 14:13:33 -0700 (PDT)
          (envelope-from se@zpr.uni-koeln.de)
Received: from x14.mi.uni-koeln.de ([134.95.219.124]) by Octopussy.MI.Uni-Koeln.DE with SMTP id AA04305
  (5.67b/IDA-1.5 for <freebsd-scsi@FreeBSD.ORG>); Wed, 8 Oct 1997 23:13:12 +0200
Received: (from se@localhost) by x14.mi.uni-koeln.de (8.8.7/8.6.9) id WAA01232; Wed, 8 Oct 1997 22:55:53 +0200 (CEST)
X-Face: "<d]#=8pzx);RzeqSKI86OVa7=!0/(uRa.+B.9Z9\eNUn@UG?!`y7yt2dFNn%k4'.}](uE%
 yCO)$e&Y1%3EO~ifu6Q-#YUM&JZ't,}JkPnAz,8Dj33u%@GBi%[Y#LHz$]h7a<p4)-jKI7~sKjlP-^
 EvA[G;]v&0]W!EL%shs,{7x0|oqN4YVIs5,NI#,V{9"WF):5&RkOhyj*#-IAG}Tnu;YCF,d
Message-Id: <19971008225552.49139@mi.uni-koeln.de>
Date: Wed, 8 Oct 1997 22:55:52 +0200
From: Stefan Esser <se@FreeBSD.ORG>
To: Philippe Regnauld <regnauld@deepo.prosa.dk>
Cc: freebsd-scsi@FreeBSD.ORG
Subject: Re: 2.2.2 anc NCR875 failures
References: <19971008113725.46245@deepo.prosa.dk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.84
In-Reply-To: <19971008113725.46245@deepo.prosa.dk>; from Philippe Regnauld on Wed, Oct 08, 1997 at 11:37:25AM +0200
Sender: owner-freebsd-scsi@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

On 1997-10-08 11:37 +0200, Philippe Regnauld <regnauld@deepo.prosa.dk> wrote:
> I just got (a week ago) a new machine to run a keyserver on...
> The configuration is
> 
> TX97/K6-180, NCR-875, 64MB RAM, 2 x 2.2 Atlas II UW disks.
> 
> I've had the following failure three times so far, I would
> guess during some fair amount of disk i/o: (written on paper,
> I'm trying to reread myself):
> 
> 
> ncr0: ERROR (81:0) 8af80 (10/1b) @24:00000000

The NCR is failing on one the first instructions, and 
the error code indicates that an illegal instruction
has been fetched. This was most probably caused by a 
jump to the immediate operand of an instruction:

/*--------------------------< START >-----------------------*/ {
	/*
	**	Claim to be still alive ...
	*/
	SCR_COPY (sizeof (((struct ncb *)0)->heartbeat)),
		KVAR (KVAR_TIME_TV_SEC),
		NADDR (heartbeat),
	/*
	**      Make data structure address invalid.
	**      clear SIGP.
	*/
	SCR_LOAD_REG (dsa, 0xff),
		0,
	SCR_FROM_REG (ctest2),
===>>>		0,

The NCR processor tried to execute that constant 0, and
it was not recognized as a valid instruction ...

Hmmm, the (10/1b) in the error message indicate, that
synchronous transfers have been negotiated (the offset
is set to 0x10 == 16 bytes), but the clock pre-scaler
(0x1b) is not set correctly for the 53c875, it appears!

But I don't understand, how you can possibly complete a 
single SCSI transfer, at twice the correct clock rate.

You did not tell, which version of the NCR driver (and
FreeBSD) that is. The pre-scaler may be correct, if you
are running the NCR driver as of FreeBSD-2.2.2 and if 
the 53c875 is revision 2 or newer.

> In the two other cases, I had some other message, every
> 30 sec. or so, like "retrying block = xxxyyy".  No crash,
> no reboot...

Hmmm, there is no such message anywhere in the NCR
driver.

> I had to go and manually reset the machine (off-site!) every
> time.

Sorry to hear that ...

> I tried reducing TAG number in ncrcontrol -- nada.

No, your problem is different from the QUEUE FULL 
situation others are suffering from. But that may
still hurt you, if you got revision LXY4 firmware
in your Atlas II drives ...

> Help ?

Please let me know, what version of FreeBSD and the
NCR driver you are using. Booting with "-v -v" will
enable extra verbose boot message, and there will be
more information on the NCR initalization. I'd like
to know those messages. 

I'm very sorry for the inconvenience. I'll try to help 
you get this problem solved as quickly as possible, but 
it does look like a hardware problem to me, currently.

But it may also be because of the timing loop used to
measure the NCR 875 clock frequency, which may fail on 
your particular hardware, for as of now unknown reasons.

Regards, STefan