From owner-freebsd-smp Fri Aug 8 09:32:26 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id JAA27263 for smp-outgoing; Fri, 8 Aug 1997 09:32:26 -0700 (PDT) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.54]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id JAA27254; Fri, 8 Aug 1997 09:32:21 -0700 (PDT) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.8.6/8.8.5) id JAA03485; Fri, 8 Aug 1997 09:34:30 GMT From: Steve Kargl Message-Id: <199708080934.JAA03485@troutmask.apl.washington.edu> Subject: Re: scsi time-out & lockup under smp In-Reply-To: <199708081208.AA04989@iluvatar.unx.sas.com> from "John W. DeBoskey" at "Aug 8, 97 08:08:50 am" To: jwd@unx.sas.com (John W. DeBoskey) Date: Fri, 8 Aug 1997 09:34:30 +0000 (GMT) Cc: freebsd-current@FreeBSD.ORG, freebsd-smp@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL32 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk According to John W. DeBoskey: > Hello, > > I'm wondering if anyone might have some information relating to the > following problem. > > I have the 3.0-970731-SNAP installed on a Dell PowerEdge 6100/200, > four processor machine. The problem occurs on either of the two > aic7880 onboard scsi devices, or a 2940 adapter board, when in > multi-proccessor mode. Anywhere from 5 to 60 minutes after booting > the machine, it freezes with the following messages on the console: > > sd0: SCB 0x1 - timed out in command pahse, SCSISIGI == 0x86 > SEQADDR = 0x8c SCSISEQ = 0x12 SSTAT0 = 0x7 SSTAT1 = 0x3 > sd0: abort message in message buffer > sd0: SCB 1 - Abort Completed. > sd0: no longer in timeout > sd0: SCB 0x1 - timed our while idle, LASTPHASE == 0x1, SCSISIGI = 0x0 > SEQADDR = 0xb SCSISEQ = 0x12 SSTAT0 = 0x5 SSTAT1 = 0x2 > sd0: Queueing an Abort SCB > > > It only seems to occur when I start to initiate heavy disk io. It > does not happen in the uni-proccesor situation. The complete output > from dmesg is appended to this mail. If anyone can help me track this > down, I'd really appreciate it. > > Thanks, > John > It occurs on uni-processor system, too. If I use dump(1) to backup my system, I eventually get the following: st0(ahc0:2:0): SCB 0x0 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0 SEQADDR = 0x5 SCSISEQ = 0x12 SSTAT0 = 0x5 SSTAT1 = 0xa st0(ahc0:2:0): Queueing an Abort SCB st0(ahc0:2:0): Abort Message Sent st0(ahc0:2:0): SCB 0 - Abort Completed. st0(ahc0:2:0): no longer in timeout st0(ahc0:2:0): SCB 0x0 - timed out in dataout phase, SCSISIGI == 0xc6 SEQADDR = 0x42 SCSISEQ = 0x12 SSTAT0 = 0x7 SSTAT1 = 0x13 sd0(ahc0:0:0): abort message in message buffer sd0(ahc0:0:0): SCB 3 - Abort Completed. sd0(ahc0:0:0): no longer in timeout st0(ahc0:2:0): SCB 0x0 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0 SEQADDR = 0x5 SCSISEQ = 0x12 SSTAT0 = 0x5 SSTAT1 = 0xa st0(ahc0:2:0): SCB 0: Immediate reset. Flags = 0x1 ahc0: Issued Channel A Bus Reset. 3 SCBs aborted Clearing bus reset Clearing 'in-reset' flag st0(ahc0:2:0): no longer in timeout sd0(ahc0:0:0): UNIT ATTENTION asc:29,0 sd0(ahc0:0:0): Power on, reset, or bus device reset occurred, retries:3 sd1(ahc0:1:0): UNIT ATTENTION asc:29,0 sd1(ahc0:1:0): Power on, reset, or bus device reset occurred, retries:4 >From dmesg: ahc0: rev 0x03 int a irq 11 on pci0.12.0 ahc0: aic7870 Single Channel, SCSI Id=7, 16 SCBs ahc0: waiting for scsi devices to settle scbus0 at ahc0 bus 0 sd0 at scbus0 target 0 lun 0 sd0: type 0 fixed SCSI 2 sd0: Direct-Access 1030MB (2109840 512 byte sectors) st0 at scbus0 target 2 lun 0 st0: type 1 removable SCSI 2 st0: Sequential-Access density code 0x13, drive empty -- Steve finger kargl@troutmask.apl.washington.edu http://troutmask.apl.washington.edu/~kargl/sgk.html