From owner-cvs-sys Thu Dec 28 14:07:33 1995 Return-Path: owner-cvs-sys Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id OAA03435 for cvs-sys-outgoing; Thu, 28 Dec 1995 14:07:33 -0800 (PST) Received: from Sysiphos (Sysiphos.MI.Uni-Koeln.DE [134.95.212.10]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id OAA03421 Thu, 28 Dec 1995 14:07:08 -0800 (PST) Received: by Sysiphos id AA06029 (5.67b/IDA-1.5); Thu, 28 Dec 1995 23:05:50 +0100 Message-Id: <199512282205.AA06029@Sysiphos> From: se@zpr.uni-koeln.de (Stefan Esser) Date: Thu, 28 Dec 1995 23:05:50 +0100 In-Reply-To: "Rodney W. Grimes" "Re: cvs commit: src/sys/pci ncr.c" (Dec 28, 12:50) X-Mailer: Mail User's Shell (7.2.6 alpha(2) 7/9/95) To: "Rodney W. Grimes" Subject: Re: cvs commit: src/sys/pci ncr.c Cc: CVS-committers@freefall.freebsd.org, cvs-sys@freefall.freebsd.org, Andrew Russell , Dmitry Kohmanyuk , Joakim Henriksson , Karl Wiebe , Rich Beerman Sender: owner-cvs-sys@FreeBSD.ORG Precedence: bulk On Dec 28, 12:50, "Rodney W. Grimes" wrote: } Subject: Re: cvs commit: src/sys/pci ncr.c } > } > se 95/12/28 05:04:05 } > } > Modified: sys/pci ncr.c } > Log: } > Preserve SIGP bit when clearing INTF condition. } } Can you expand upon the ramifications of this fix? Ie, how does the } problem it fix manifest itself, symptoms, etc. This is supposed to fix the timeouts (which eventually lead to bus resets) observed on a few systems over the last few months, e.g.: % ncr0: SCSI phase error fixup: CCB already dequeued (0xf06bdc00) % ncr0:2: ERROR (80:100) (e-a9-23) (e0/13) @ (1214:0e000000). % script cmd = c0000001 % reg: da 10 00 13 47 e0 03 1f 00 0e 82 a9 80 00 01 00. % ncr0: handshake timeout I've never had it happen on my system, but Gerard Roudier managed to reproduce the problem under Linux (when doing the Linux port :) and suggested a fix, which made his system work reliable. In a way, I'm surprised this fix makes any difference at all, but I've got to believe it ... (It's kind of hard to believe, since the NCR is polled once a second, and SIGP is set to 1 on these occasions. For this reason it should have hardly any effect, if it was in fact possible to reset SIGP. But I neither observed that kind of a few seconds sleep nor the corresponding console message written by the timeout handler.) The interrupt register (sist) contains a number of status bits, and writing a 1 to some bit acknowledges recognition of the corresponding interrupt condition. Now it seems, that SIGP (which makes the NCR start execution if set) can be reset by writing a 0 into it's bit position. I don't have the NCR manual here right now, and I can't check whether this is in fact documented behaviour, but the patch seems to fix the problem. The previous code assumed that writing 0 bits to any of the registers was a NOP, but it might in fact be true, that the SIGP bit is special, and does react not only on a 1 being written (as documented), but also on a 0 ... I'm sure that this change can't break anything, since writing a 1 to SIGP is allowed at any time. It will just wake up the NCR if it was sleeping, and if nothing is to be done, it will go to sleep again. People who might see an improvement are: Andrew Russell David Greenman Dmitry Kohmanyuk Joakim Henriksson Karl Wiebe Rich Beerman Satoshi Asami Some reported about single failures and I'm not sure their reports have not been caused by transient effects. I'm CCing this message to the above list of people, and I'd like to hear whether the problem did still exist with a recent version of the NCR driver, and whether the fix does help them ... (David reported a single failure, and I suppose it didn't repeat ??? And Satoshi reported timeouts with fsck. In most cases the problems were solved by disabling tags or upgrading the drive's firmware ...) Regards, STefan -- Stefan Esser, Zentrum fuer Paralleles Rechnen Tel: +49 221 4706021 Universitaet zu Koeln, Weyertal 80, 50931 Koeln FAX: +49 221 4705160 ============================================================================== http://www.zpr.uni-koeln.de/~se