Date: Thu, 13 Mar 1997 11:06:30 -0500 From: Rohit Dube <rohit@cs.umd.edu> To: hardware@freebsd.org Cc: Rodney Grimes <rgrimes@gndrsh.aac.dev.com> Subject: dump / dd / amdump crashing FreeBSD 2.1.5 / 2.2 machines Message-ID: <199703131606.LAA23608@seine.cs.umd.edu>
next in thread | raw e-mail | index | archive | help
I am seeing some weird problems with a couple of machines
running 2.2-970225-GAMMA
Everynight when we run amanda's 'amdump', these machines
crash. The crash can also be triggered by a 'dump'
to /dev/null or a 'dd'. (Not entirely deterministic but all 3
crash the machines most of the time). Switching the kernel to
2.1.5 does not solve the problem either.
We have the following hardware on the machines which are crashing -
(curtailed dmesg output showing only the PCI devices)
Probing for devices on PCI bus 0:
chip0 <Intel 82439> rev 3 on pci0:0
chip1 <Intel 82371SB PCI-ISA bridge> rev 1 on pci0:7:0
chip2 <Intel 82371SB IDE interface> rev 0 on pci0:7:1
vga0 <VGA-compatible display device> rev 0 int a irq 12 on pci0:9
de0 <Digital 21140A Fast Ethernet> rev 32 int a irq 10 on pci0:11
de0: SMC 9332BDT 21140A [10-100Mb/s] pass 2.0
de0: address 00:00:c0:03:6b:f9
ahc0 <Adaptec 2940 Ultra SCSI host adapter> rev 0 int a irq 11 on pci0:12
ahc0: aic7880 Single Channel, SCSI Id=7, 16 SCBs
ahc0 waiting for scsi devices to settle
(ahc0:0:0): "MICROP 4421-07 0329SJ 0329" type 0 fixed SCSI 2
sd0(ahc0:0:0): Direct-Access 2047MB (4193360 512 byte sectors)
(ahc0:6:0): "SONY CD-ROM CDU-76S 1.2d" type 5 removable SCSI 2
cd0(ahc0:6:0): CD-ROM cd present [400000 x 2048 byte records]
The console shows the following error messages (which are not logged as
the disk is inacessible):
sd0(ahc0:0:0): no longer in timeout
ahc0: Issued Channel A Bus Reset: 2SCBs aborted
Clearing bus reset
Clearing 'in-reset' flag
Sd0(ahc0:0:0): SCB 0x1 - timed out while idle
LASTPHASE == 0x1, SCSIISGI = 0x0
SEQADDR == 0x12
The above message repeats with different values for SEQADDR.
The first message which gets printed out says something like
'timed out in command phase'. I can't paraphrase it here as it happened
in the middle of the night and scrolled off.
Has somebody else seen a problem like this before? Or would otherwise know
what is going on here?
Any help greatly appreciated! Just can't afford to have these machines
go down every night while doing a backup!!
Thanks.
--rohit.
PS: I am attaching the output of 'scsi -f /dev/rsd0 -m1' and 'df' here,
if that is of any use in tracking this problem.
#scsi -f /dev/rsd0 -m1
AWRE (Auto Write Reallocation Enbld): 1
ARRE (Auto Read Reallocation Enbld): 1
TB (Transfer Block): 0
RC (Read Continuous): 0
EER (Enable Early Recovery): 0
PER (Post Error): 0
DTE (Disable Transfer on Error): 0
DCR (Disable Correction): 0
Read Retry Count: 14
Correction Span: 28
Head Offset Count: 0
Data Strobe Offset Count: 0
Write Retry Count: 15
Recovery Time Limit: 0
# df
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/sd0a 47183 13098 30311 30% /
/dev/sd0s1f 1822738 504147 1172772 30% /usr
/dev/sd0s1e 98479 1372 89229 2% /var
procfs 4 4 0 100% /proc
amd:96 0 0 0 100% /fs
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199703131606.LAA23608>
