Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Mar 1997 11:06:30 -0500
From:      Rohit Dube <rohit@cs.umd.edu>
To:        hardware@freebsd.org
Cc:        Rodney Grimes <rgrimes@gndrsh.aac.dev.com>
Subject:   dump / dd / amdump crashing FreeBSD 2.1.5 / 2.2 machines
Message-ID:  <199703131606.LAA23608@seine.cs.umd.edu>

next in thread | raw e-mail | index | archive | help

I am seeing some weird problems with a couple of machines
running 2.2-970225-GAMMA

Everynight when we run amanda's 'amdump', these machines
crash. The crash can also be triggered by a 'dump'
to /dev/null or a 'dd'. (Not entirely deterministic but all 3
crash the machines most of the time). Switching the kernel to 
2.1.5 does not solve the problem either.

We have the following hardware on the machines which are crashing -
(curtailed dmesg output showing only the PCI devices)

Probing for devices on PCI bus 0:
chip0 <Intel 82439> rev 3 on pci0:0
chip1 <Intel 82371SB PCI-ISA bridge> rev 1 on pci0:7:0
chip2 <Intel 82371SB IDE interface> rev 0 on pci0:7:1
vga0 <VGA-compatible display device> rev 0 int a irq 12 on pci0:9
de0 <Digital 21140A Fast Ethernet> rev 32 int a irq 10 on pci0:11
de0: SMC 9332BDT 21140A [10-100Mb/s] pass 2.0
de0: address 00:00:c0:03:6b:f9
ahc0 <Adaptec 2940 Ultra SCSI host adapter> rev 0 int a irq 11 on pci0:12
ahc0: aic7880 Single Channel, SCSI Id=7, 16 SCBs
ahc0 waiting for scsi devices to settle
(ahc0:0:0): "MICROP 4421-07   0329SJ 0329" type 0 fixed SCSI 2
sd0(ahc0:0:0): Direct-Access 2047MB (4193360 512 byte sectors)
(ahc0:6:0): "SONY CD-ROM CDU-76S 1.2d" type 5 removable SCSI 2
cd0(ahc0:6:0): CD-ROM cd present [400000 x 2048 byte records]


The console shows the following error messages (which are not logged as
the disk is inacessible):

sd0(ahc0:0:0): no longer in timeout
ahc0: Issued Channel A Bus Reset: 2SCBs aborted
Clearing bus reset
Clearing 'in-reset' flag
Sd0(ahc0:0:0): SCB 0x1 - timed out while idle
               LASTPHASE == 0x1, SCSIISGI = 0x0
	       SEQADDR == 0x12

The above message repeats with different values for SEQADDR.

The first message which gets printed out says something like 
'timed out in command phase'. I can't paraphrase it here as it happened 
in the middle of the night and scrolled off.

Has somebody else seen a problem like this before? Or would otherwise know
what is going on here?

Any help greatly appreciated! Just can't afford to have these machines
go down every night while doing a backup!!

Thanks.

--rohit.

PS: I am attaching the output of 'scsi -f /dev/rsd0 -m1' and 'df' here,
    if that is of any use in tracking this problem.

#scsi -f /dev/rsd0 -m1
AWRE (Auto Write Reallocation Enbld):  1 
ARRE (Auto Read Reallocation Enbld):  1 
TB (Transfer Block):  0 
RC (Read Continuous):  0 
EER (Enable Early Recovery):  0 
PER (Post Error):  0 
DTE (Disable Transfer on Error):  0 
DCR (Disable Correction):  0 
Read Retry Count:  14 
Correction Span:  28 
Head Offset Count:  0 
Data Strobe Offset Count:  0 
Write Retry Count:  15 
Recovery Time Limit:  0 

# df
Filesystem  1K-blocks     Used    Avail Capacity  Mounted on
/dev/sd0a       47183    13098    30311    30%    /
/dev/sd0s1f   1822738   504147  1172772    30%    /usr
/dev/sd0s1e     98479     1372    89229     2%    /var
procfs              4        4        0   100%    /proc
amd:96              0        0        0   100%    /fs



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199703131606.LAA23608>