Date: Thu, 13 Mar 1997 11:06:30 -0500 From: Rohit Dube <rohit@cs.umd.edu> To: hardware@freebsd.org Cc: Rodney Grimes <rgrimes@gndrsh.aac.dev.com> Subject: dump / dd / amdump crashing FreeBSD 2.1.5 / 2.2 machines Message-ID: <199703131606.LAA23608@seine.cs.umd.edu>
next in thread | raw e-mail | index | archive | help
I am seeing some weird problems with a couple of machines running 2.2-970225-GAMMA Everynight when we run amanda's 'amdump', these machines crash. The crash can also be triggered by a 'dump' to /dev/null or a 'dd'. (Not entirely deterministic but all 3 crash the machines most of the time). Switching the kernel to 2.1.5 does not solve the problem either. We have the following hardware on the machines which are crashing - (curtailed dmesg output showing only the PCI devices) Probing for devices on PCI bus 0: chip0 <Intel 82439> rev 3 on pci0:0 chip1 <Intel 82371SB PCI-ISA bridge> rev 1 on pci0:7:0 chip2 <Intel 82371SB IDE interface> rev 0 on pci0:7:1 vga0 <VGA-compatible display device> rev 0 int a irq 12 on pci0:9 de0 <Digital 21140A Fast Ethernet> rev 32 int a irq 10 on pci0:11 de0: SMC 9332BDT 21140A [10-100Mb/s] pass 2.0 de0: address 00:00:c0:03:6b:f9 ahc0 <Adaptec 2940 Ultra SCSI host adapter> rev 0 int a irq 11 on pci0:12 ahc0: aic7880 Single Channel, SCSI Id=7, 16 SCBs ahc0 waiting for scsi devices to settle (ahc0:0:0): "MICROP 4421-07 0329SJ 0329" type 0 fixed SCSI 2 sd0(ahc0:0:0): Direct-Access 2047MB (4193360 512 byte sectors) (ahc0:6:0): "SONY CD-ROM CDU-76S 1.2d" type 5 removable SCSI 2 cd0(ahc0:6:0): CD-ROM cd present [400000 x 2048 byte records] The console shows the following error messages (which are not logged as the disk is inacessible): sd0(ahc0:0:0): no longer in timeout ahc0: Issued Channel A Bus Reset: 2SCBs aborted Clearing bus reset Clearing 'in-reset' flag Sd0(ahc0:0:0): SCB 0x1 - timed out while idle LASTPHASE == 0x1, SCSIISGI = 0x0 SEQADDR == 0x12 The above message repeats with different values for SEQADDR. The first message which gets printed out says something like 'timed out in command phase'. I can't paraphrase it here as it happened in the middle of the night and scrolled off. Has somebody else seen a problem like this before? Or would otherwise know what is going on here? Any help greatly appreciated! Just can't afford to have these machines go down every night while doing a backup!! Thanks. --rohit. PS: I am attaching the output of 'scsi -f /dev/rsd0 -m1' and 'df' here, if that is of any use in tracking this problem. #scsi -f /dev/rsd0 -m1 AWRE (Auto Write Reallocation Enbld): 1 ARRE (Auto Read Reallocation Enbld): 1 TB (Transfer Block): 0 RC (Read Continuous): 0 EER (Enable Early Recovery): 0 PER (Post Error): 0 DTE (Disable Transfer on Error): 0 DCR (Disable Correction): 0 Read Retry Count: 14 Correction Span: 28 Head Offset Count: 0 Data Strobe Offset Count: 0 Write Retry Count: 15 Recovery Time Limit: 0 # df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/sd0a 47183 13098 30311 30% / /dev/sd0s1f 1822738 504147 1172772 30% /usr /dev/sd0s1e 98479 1372 89229 2% /var procfs 4 4 0 100% /proc amd:96 0 0 0 100% /fs
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199703131606.LAA23608>