From owner-freebsd-scsi Sat Apr 5 15:58:30 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id PAA09540 for freebsd-scsi-outgoing; Sat, 5 Apr 1997 15:58:30 -0800 (PST) Received: from salmon.maths.tcd.ie (mmdf@salmon.maths.tcd.ie [134.226.81.11]) by freefall.freebsd.org (8.8.5/8.8.5) with SMTP id PAA09532 for ; Sat, 5 Apr 1997 15:58:25 -0800 (PST) Received: from graves.maths.tcd.ie by salmon.maths.tcd.ie with SMTP id aa08317; 6 Apr 97 0:58 +0100 To: "Justin T. Gibbs" cc: freebsd-scsi@freebsd.org, nops@maths.tcd.ie Subject: Re: SCSI freezes in 2.2-STABLE In-reply-to: Your message of "Sat, 05 Apr 1997 15:06:08 PDT." <199704052208.PAA26098@pluto.plutotech.com> Date: Sun, 06 Apr 1997 00:58:21 +0100 From: David Malone Message-ID: <9704060058.aa08317@salmon.maths.tcd.ie> Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk > Actually, it helps quite a bit. I just fixed one bug in the selection > timeout handler of the driver that will remove the "referenced scb not > valid..." message, but doesn't explain why you got a selection timeout in > the first place. The second bit of interesting information here is that > even though we had been reselected (SELDI is true in SSTAT0), the driver > didn't see it since the current sequencer address shows we're still > looking for work to do. I made a change to the sequencer that may affect > that problem since the poll for work loop was the only place in the driver > where we attempted to test for more than a single bit in that register at > a time. Its a shot in the dark, but it may solve this problem. Let me > know how these changes fair for you. I don't know when the next CTM delta > will be generated though... I'll grab the delta as soon as it appears. Going up to CTM delta 226 helped a bit, the machine can now atleast fsck the disks when it reboots ( so far ). We've had two more crashes, I've a vmcore for the first of them, if its of any interest. The panic reason was "Couldn't find busy SCB". Here is the log messages for the second. Apr 5 22:58:37 synge /kernel: ahc1: ahc_intr - referenced scb not valid during scsiint 0x8b scb(255) Apr 5 22:58:37 synge /kernel: SIMODE0 = 0x0, SIMODE1 = 0xa4, SSTAT0 = 0x37 Apr 5 22:58:37 synge /kernel: SEQADDR = 0x5 Apr 5 22:58:47 synge /kernel: sd9(ahc1:1:0): SCB 0x1 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0xe6 Apr 5 22:58:47 synge /kernel: SEQADDR == 0x5 Apr 5 22:58:47 synge /kernel: sd9(ahc1:1:0): Queueing an Abort SCB Apr 5 22:58:47 synge /kernel: sd8(ahc1:0:0): SCB 0x2 timedout while recovery in progress Apr 5 22:58:47 synge /kernel: sd9(ahc1:1:0): SCB 0x0 timedout while recovery in progress Apr 5 22:58:47 synge /kernel: sd8(ahc1:0:0): SCB 0x3 timedout while recovery in progress The machine seems to pretty quickly, so we've stopped accepting news for the moment. David.