From owner-cvs-sys  Thu Dec 28 14:07:33 1995
Return-Path: owner-cvs-sys
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.3/8.7.3) id OAA03435
          for cvs-sys-outgoing; Thu, 28 Dec 1995 14:07:33 -0800 (PST)
Received: from Sysiphos (Sysiphos.MI.Uni-Koeln.DE [134.95.212.10])
          by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id OAA03421
          Thu, 28 Dec 1995 14:07:08 -0800 (PST)
Received: by Sysiphos id AA06029
  (5.67b/IDA-1.5); Thu, 28 Dec 1995 23:05:50 +0100
Message-Id: <199512282205.AA06029@Sysiphos>
From: se@zpr.uni-koeln.de (Stefan Esser)
Date: Thu, 28 Dec 1995 23:05:50 +0100
In-Reply-To: "Rodney W. Grimes" <rgrimes@GndRsh.aac.dev.com>
       "Re: cvs commit: src/sys/pci ncr.c" (Dec 28, 12:50)
X-Mailer: Mail User's Shell (7.2.6 alpha(2) 7/9/95)
To: "Rodney W. Grimes" <rgrimes@gndrsh.aac.dev.com>
Subject: Re: cvs commit: src/sys/pci ncr.c
Cc: CVS-committers@freefall.freebsd.org, cvs-sys@freefall.freebsd.org,
        Andrew Russell <arussell@bga.com>, Dmitry Kohmanyuk <dk@dog.farm.org>,
        Joakim Henriksson <murduth@ludd.luth.se>,
        Karl Wiebe <karl@hopf.dnai.com>, Rich Beerman <rbeer@jaguar.cris.com>
Sender: owner-cvs-sys@FreeBSD.ORG
Precedence: bulk

On Dec 28, 12:50, "Rodney W. Grimes" wrote:
} Subject: Re: cvs commit: src/sys/pci ncr.c
} > 
} > se          95/12/28 05:04:05
} > 
} >   Modified:    sys/pci   ncr.c
} >   Log:
} >   Preserve SIGP bit when clearing INTF condition.
} 
} Can you expand upon the ramifications of this fix?  Ie, how does the
} problem it fix manifest itself, symptoms, etc.

This is supposed to fix the timeouts (which eventually lead to bus resets)
observed on a few systems over the last few months, e.g.:

% ncr0: SCSI phase error fixup: CCB already dequeued (0xf06bdc00)
% ncr0:2: ERROR (80:100) (e-a9-23) (e0/13) @ (1214:0e000000).
%   script cmd = c0000001
%   reg:     da 10 00 13 47 e0 03 1f 00 0e 82 a9 80 00 01 00.
% ncr0: handshake timeout

I've never had it happen on my system, but Gerard Roudier managed to 
reproduce the problem under Linux (when doing the Linux port :) and 
suggested a fix, which made his system work reliable.

In a way, I'm surprised this fix makes any difference at all, but I've 
got to believe it ...
(It's kind of hard to believe, since the NCR is polled once a second, 
and SIGP is set to 1 on these occasions. For this reason it should have 
hardly any effect, if it was in fact possible to reset SIGP. But I neither 
observed that kind of a few seconds sleep nor the corresponding console 
message written by the timeout handler.)

The interrupt register (sist) contains a number of status bits, and 
writing a 1 to some bit acknowledges recognition of the corresponding 
interrupt condition. Now it seems, that SIGP (which makes the NCR start 
execution if set) can be reset by writing a 0 into it's bit position.
I don't have the NCR manual here right now, and I can't check whether 
this is in fact documented behaviour, but the patch seems to fix the 
problem.

The previous code assumed that writing 0 bits to any of the registers 
was a NOP, but it might in fact be true, that the SIGP bit is special, 
and does react not only on a 1 being written (as documented), but also 
on a 0 ...

I'm sure that this change can't break anything, since writing a 1 to 
SIGP is allowed at any time. It will just wake up the NCR if it was 
sleeping, and if nothing is to be done, it will go to sleep again.

People who might see an improvement are:

 Andrew Russell <arussell@bga.com>
 David Greenman <davidg@root.com>
 Dmitry Kohmanyuk <dk@dog.farm.org>
 Joakim Henriksson <murduth@ludd.luth.se>
 Karl Wiebe <karl@hopf.dnai.com>
 Rich Beerman <rbeer@jaguar.cris.com>
 Satoshi Asami <asami@cs.berkeley.edu>

Some reported about single failures and I'm not sure their reports have 
not been caused by transient effects. 

I'm CCing this message to the above list of people, and I'd like to hear 
whether the problem did still exist with a recent version of the NCR driver, 
and whether the fix does help them ...

(David reported a single failure, and I suppose it didn't repeat ??? And 
Satoshi reported timeouts with fsck. In most cases the problems were solved 
by disabling tags or upgrading the drive's firmware ...)

Regards, STefan

-- 
 Stefan Esser, Zentrum fuer Paralleles Rechnen		Tel:	+49 221 4706021
 Universitaet zu Koeln, Weyertal 80, 50931 Koeln	FAX:	+49 221 4705160
 ==============================================================================
 http://www.zpr.uni-koeln.de/~se			  <se@ZPR.Uni-Koeln.DE>