From owner-freebsd-scsi  Sat Apr  5 15:58:30 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.5/8.8.5) id PAA09540
          for freebsd-scsi-outgoing; Sat, 5 Apr 1997 15:58:30 -0800 (PST)
Received: from salmon.maths.tcd.ie (mmdf@salmon.maths.tcd.ie [134.226.81.11])
          by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id PAA09532
          for <freebsd-scsi@freebsd.org>; Sat, 5 Apr 1997 15:58:25 -0800 (PST)
Received: from graves.maths.tcd.ie by salmon.maths.tcd.ie  with SMTP id aa08317;
          6 Apr 97 0:58 +0100
To: "Justin T. Gibbs" <gibbs@plutotech.com>
cc: freebsd-scsi@freebsd.org, nops@maths.tcd.ie
Subject: Re: SCSI freezes in 2.2-STABLE 
In-reply-to: Your message of "Sat, 05 Apr 1997 15:06:08 PDT."
             <199704052208.PAA26098@pluto.plutotech.com> 
Date: Sun, 06 Apr 1997 00:58:21 +0100
From: David Malone <dwmalone@maths.tcd.ie>
Message-ID:  <9704060058.aa08317@salmon.maths.tcd.ie>
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

> Actually, it helps quite a bit.  I just fixed one bug in the selection 
> timeout handler of the driver that will remove the "referenced scb not 
> valid..." message, but doesn't explain why you got a selection timeout in 
> the first place.  The second bit of interesting information here is that 
> even though we had been reselected (SELDI is true in SSTAT0), the driver 
> didn't see it since the current sequencer address shows we're still 
> looking for work to do.  I made a change to the sequencer that may affect 
> that problem since the poll for work loop was the only place in the driver 
> where we attempted to test for more than a single bit in that register at 
> a time.  Its a shot in the dark, but it may solve this problem.  Let me 
> know how these changes fair for you.  I don't know when the next CTM delta 
> will be generated though...

I'll grab the delta as soon as it appears.

Going up to CTM delta 226 helped a bit, the machine can now atleast
fsck the disks when it reboots ( so far ). We've had two more crashes,
I've a vmcore for the first of them, if its of any interest. The panic
reason was "Couldn't find busy SCB".

Here is the log messages for the second.

Apr  5 22:58:37 synge /kernel: ahc1: ahc_intr - referenced scb not valid during 
 scsiint 0x8b scb(255)                                                          
Apr  5 22:58:37 synge /kernel: SIMODE0 = 0x0, SIMODE1 = 0xa4, SSTAT0 = 0x37     
Apr  5 22:58:37 synge /kernel: SEQADDR = 0x5                                    
Apr  5 22:58:47 synge /kernel: sd9(ahc1:1:0): SCB 0x1 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0xe6                                               
Apr  5 22:58:47 synge /kernel: SEQADDR == 0x5                                   
Apr  5 22:58:47 synge /kernel: sd9(ahc1:1:0): Queueing an Abort SCB             
Apr  5 22:58:47 synge /kernel: sd8(ahc1:0:0): SCB 0x2 timedout while recovery in progress                                                                      
Apr  5 22:58:47 synge /kernel: sd9(ahc1:1:0): SCB 0x0 timedout while recovery in progress                                                                      
Apr  5 22:58:47 synge /kernel: sd8(ahc1:0:0): SCB 0x3 timedout while recovery in progress  


The machine seems to pretty quickly, so we've stopped accepting news for
the moment.

	David.