From owner-freebsd-hardware  Thu Mar 13 08:06:36 1997
Return-Path: <owner-hardware>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.5/8.8.5) id IAA04543
          for hardware-outgoing; Thu, 13 Mar 1997 08:06:36 -0800 (PST)
Received: from seine.cs.umd.edu (seine.cs.umd.edu [128.8.128.59])
          by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id IAA04523
          for <hardware@freebsd.org>; Thu, 13 Mar 1997 08:06:32 -0800 (PST)
Received: by seine.cs.umd.edu (8.8.5/UMIACS-0.9/04-05-88)
	id LAA23608; Thu, 13 Mar 1997 11:06:30 -0500 (EST)
Message-Id: <199703131606.LAA23608@seine.cs.umd.edu>
To: hardware@freebsd.org
cc: Rodney Grimes <rgrimes@gndrsh.aac.dev.com>
Subject: dump / dd / amdump crashing FreeBSD 2.1.5 / 2.2 machines
Date: Thu, 13 Mar 1997 11:06:30 -0500
From: Rohit Dube <rohit@cs.umd.edu>
Sender: owner-hardware@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk


I am seeing some weird problems with a couple of machines
running 2.2-970225-GAMMA

Everynight when we run amanda's 'amdump', these machines
crash. The crash can also be triggered by a 'dump'
to /dev/null or a 'dd'. (Not entirely deterministic but all 3
crash the machines most of the time). Switching the kernel to 
2.1.5 does not solve the problem either.

We have the following hardware on the machines which are crashing -
(curtailed dmesg output showing only the PCI devices)

Probing for devices on PCI bus 0:
chip0 <Intel 82439> rev 3 on pci0:0
chip1 <Intel 82371SB PCI-ISA bridge> rev 1 on pci0:7:0
chip2 <Intel 82371SB IDE interface> rev 0 on pci0:7:1
vga0 <VGA-compatible display device> rev 0 int a irq 12 on pci0:9
de0 <Digital 21140A Fast Ethernet> rev 32 int a irq 10 on pci0:11
de0: SMC 9332BDT 21140A [10-100Mb/s] pass 2.0
de0: address 00:00:c0:03:6b:f9
ahc0 <Adaptec 2940 Ultra SCSI host adapter> rev 0 int a irq 11 on pci0:12
ahc0: aic7880 Single Channel, SCSI Id=7, 16 SCBs
ahc0 waiting for scsi devices to settle
(ahc0:0:0): "MICROP 4421-07   0329SJ 0329" type 0 fixed SCSI 2
sd0(ahc0:0:0): Direct-Access 2047MB (4193360 512 byte sectors)
(ahc0:6:0): "SONY CD-ROM CDU-76S 1.2d" type 5 removable SCSI 2
cd0(ahc0:6:0): CD-ROM cd present [400000 x 2048 byte records]


The console shows the following error messages (which are not logged as
the disk is inacessible):

sd0(ahc0:0:0): no longer in timeout
ahc0: Issued Channel A Bus Reset: 2SCBs aborted
Clearing bus reset
Clearing 'in-reset' flag
Sd0(ahc0:0:0): SCB 0x1 - timed out while idle
               LASTPHASE == 0x1, SCSIISGI = 0x0
	       SEQADDR == 0x12

The above message repeats with different values for SEQADDR.

The first message which gets printed out says something like 
'timed out in command phase'. I can't paraphrase it here as it happened 
in the middle of the night and scrolled off.

Has somebody else seen a problem like this before? Or would otherwise know
what is going on here?

Any help greatly appreciated! Just can't afford to have these machines
go down every night while doing a backup!!

Thanks.

--rohit.

PS: I am attaching the output of 'scsi -f /dev/rsd0 -m1' and 'df' here,
    if that is of any use in tracking this problem.

#scsi -f /dev/rsd0 -m1
AWRE (Auto Write Reallocation Enbld):  1 
ARRE (Auto Read Reallocation Enbld):  1 
TB (Transfer Block):  0 
RC (Read Continuous):  0 
EER (Enable Early Recovery):  0 
PER (Post Error):  0 
DTE (Disable Transfer on Error):  0 
DCR (Disable Correction):  0 
Read Retry Count:  14 
Correction Span:  28 
Head Offset Count:  0 
Data Strobe Offset Count:  0 
Write Retry Count:  15 
Recovery Time Limit:  0 

# df
Filesystem  1K-blocks     Used    Avail Capacity  Mounted on
/dev/sd0a       47183    13098    30311    30%    /
/dev/sd0s1f   1822738   504147  1172772    30%    /usr
/dev/sd0s1e     98479     1372    89229     2%    /var
procfs              4        4        0   100%    /proc
amd:96              0        0        0   100%    /fs