From owner-freebsd-smp  Fri Aug  8 09:32:26 1997
Return-Path: <owner-freebsd-smp>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id JAA27263
          for smp-outgoing; Fri, 8 Aug 1997 09:32:26 -0700 (PDT)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.54])
          by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id JAA27254;
          Fri, 8 Aug 1997 09:32:21 -0700 (PDT)
Received: (from sgk@localhost)
	by troutmask.apl.washington.edu (8.8.6/8.8.5) id JAA03485;
	Fri, 8 Aug 1997 09:34:30 GMT
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
Message-Id: <199708080934.JAA03485@troutmask.apl.washington.edu>
Subject: Re: scsi time-out & lockup under smp
In-Reply-To: <199708081208.AA04989@iluvatar.unx.sas.com> from "John W. DeBoskey" at "Aug 8, 97 08:08:50 am"
To: jwd@unx.sas.com (John W. DeBoskey)
Date: Fri, 8 Aug 1997 09:34:30 +0000 (GMT)
Cc: freebsd-current@FreeBSD.ORG, freebsd-smp@FreeBSD.ORG
X-Mailer: ELM [version 2.4ME+ PL32 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-smp@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

According to John W. DeBoskey:
> Hello,
> 
>    I'm wondering if anyone might have some information relating to the
> following problem.
> 
>    I have the 3.0-970731-SNAP installed on a Dell PowerEdge 6100/200,
> four processor machine. The problem occurs on either of the two
> aic7880 onboard scsi devices, or a 2940 adapter board, when in 
> multi-proccessor mode. Anywhere from 5 to 60 minutes after booting
> the machine, it freezes with the following messages on the console:
> 
> sd0: SCB 0x1 - timed out in command pahse, SCSISIGI == 0x86
> SEQADDR = 0x8c SCSISEQ = 0x12 SSTAT0 = 0x7 SSTAT1 = 0x3
> sd0: abort message in message buffer
> sd0: SCB 1 - Abort Completed.
> sd0: no longer in timeout
> sd0: SCB 0x1 - timed our while idle, LASTPHASE == 0x1, SCSISIGI = 0x0
> SEQADDR = 0xb SCSISEQ = 0x12 SSTAT0 = 0x5 SSTAT1 = 0x2
> sd0: Queueing an Abort SCB
> 
> 
>    It only seems to occur when I start to initiate heavy disk io. It
> does not happen in the uni-proccesor situation. The complete output
> from dmesg is appended to this mail. If anyone can help me track this
> down, I'd really appreciate it.
> 
> Thanks,
> John
> 


It occurs on uni-processor system, too.  If I use dump(1) to backup
my system, I eventually get the following:

st0(ahc0:2:0): SCB 0x0 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
SEQADDR = 0x5 SCSISEQ = 0x12 SSTAT0 = 0x5 SSTAT1 = 0xa
st0(ahc0:2:0): Queueing an Abort SCB
st0(ahc0:2:0): Abort Message Sent
st0(ahc0:2:0): SCB 0 - Abort Completed.
st0(ahc0:2:0): no longer in timeout
st0(ahc0:2:0): SCB 0x0 - timed out in dataout phase, SCSISIGI == 0xc6
SEQADDR = 0x42 SCSISEQ = 0x12 SSTAT0 = 0x7 SSTAT1 = 0x13
sd0(ahc0:0:0): abort message in message buffer
sd0(ahc0:0:0): SCB 3 - Abort Completed.
sd0(ahc0:0:0): no longer in timeout
st0(ahc0:2:0): SCB 0x0 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0
SEQADDR = 0x5 SCSISEQ = 0x12 SSTAT0 = 0x5 SSTAT1 = 0xa
st0(ahc0:2:0): SCB 0: Immediate reset.  Flags = 0x1
ahc0: Issued Channel A Bus Reset. 3 SCBs aborted
Clearing bus reset
Clearing 'in-reset' flag
st0(ahc0:2:0): no longer in timeout
sd0(ahc0:0:0): UNIT ATTENTION asc:29,0
sd0(ahc0:0:0):  Power on, reset, or bus device reset occurred, retries:3
sd1(ahc0:1:0): UNIT ATTENTION asc:29,0
sd1(ahc0:1:0):  Power on, reset, or bus device reset occurred, retries:4


>From dmesg:

ahc0: <Adaptec 2940 SCSI host adapter> rev 0x03 int a irq 11 on pci0.12.0
ahc0: aic7870 Single Channel, SCSI Id=7, 16 SCBs
ahc0: waiting for scsi devices to settle
scbus0 at ahc0 bus 0
sd0 at scbus0 target 0 lun 0
sd0: <SEAGATE ST51080N 0943> type 0 fixed SCSI 2
sd0: Direct-Access 1030MB (2109840 512 byte sectors)
st0 at scbus0 target 2 lun 0
st0: <HP HP35480A 1109> type 1 removable SCSI 2
st0: Sequential-Access density code 0x13,  drive empty

-- 
Steve

finger kargl@troutmask.apl.washington.edu
http://troutmask.apl.washington.edu/~kargl/sgk.html