From owner-freebsd-hardware Thu Mar 13 08:06:36 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id IAA04543 for hardware-outgoing; Thu, 13 Mar 1997 08:06:36 -0800 (PST) Received: from seine.cs.umd.edu (seine.cs.umd.edu [128.8.128.59]) by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id IAA04523 for ; Thu, 13 Mar 1997 08:06:32 -0800 (PST) Received: by seine.cs.umd.edu (8.8.5/UMIACS-0.9/04-05-88) id LAA23608; Thu, 13 Mar 1997 11:06:30 -0500 (EST) Message-Id: <199703131606.LAA23608@seine.cs.umd.edu> To: hardware@freebsd.org cc: Rodney Grimes Subject: dump / dd / amdump crashing FreeBSD 2.1.5 / 2.2 machines Date: Thu, 13 Mar 1997 11:06:30 -0500 From: Rohit Dube Sender: owner-hardware@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I am seeing some weird problems with a couple of machines running 2.2-970225-GAMMA Everynight when we run amanda's 'amdump', these machines crash. The crash can also be triggered by a 'dump' to /dev/null or a 'dd'. (Not entirely deterministic but all 3 crash the machines most of the time). Switching the kernel to 2.1.5 does not solve the problem either. We have the following hardware on the machines which are crashing - (curtailed dmesg output showing only the PCI devices) Probing for devices on PCI bus 0: chip0 rev 3 on pci0:0 chip1 rev 1 on pci0:7:0 chip2 rev 0 on pci0:7:1 vga0 rev 0 int a irq 12 on pci0:9 de0 rev 32 int a irq 10 on pci0:11 de0: SMC 9332BDT 21140A [10-100Mb/s] pass 2.0 de0: address 00:00:c0:03:6b:f9 ahc0 rev 0 int a irq 11 on pci0:12 ahc0: aic7880 Single Channel, SCSI Id=7, 16 SCBs ahc0 waiting for scsi devices to settle (ahc0:0:0): "MICROP 4421-07 0329SJ 0329" type 0 fixed SCSI 2 sd0(ahc0:0:0): Direct-Access 2047MB (4193360 512 byte sectors) (ahc0:6:0): "SONY CD-ROM CDU-76S 1.2d" type 5 removable SCSI 2 cd0(ahc0:6:0): CD-ROM cd present [400000 x 2048 byte records] The console shows the following error messages (which are not logged as the disk is inacessible): sd0(ahc0:0:0): no longer in timeout ahc0: Issued Channel A Bus Reset: 2SCBs aborted Clearing bus reset Clearing 'in-reset' flag Sd0(ahc0:0:0): SCB 0x1 - timed out while idle LASTPHASE == 0x1, SCSIISGI = 0x0 SEQADDR == 0x12 The above message repeats with different values for SEQADDR. The first message which gets printed out says something like 'timed out in command phase'. I can't paraphrase it here as it happened in the middle of the night and scrolled off. Has somebody else seen a problem like this before? Or would otherwise know what is going on here? Any help greatly appreciated! Just can't afford to have these machines go down every night while doing a backup!! Thanks. --rohit. PS: I am attaching the output of 'scsi -f /dev/rsd0 -m1' and 'df' here, if that is of any use in tracking this problem. #scsi -f /dev/rsd0 -m1 AWRE (Auto Write Reallocation Enbld): 1 ARRE (Auto Read Reallocation Enbld): 1 TB (Transfer Block): 0 RC (Read Continuous): 0 EER (Enable Early Recovery): 0 PER (Post Error): 0 DTE (Disable Transfer on Error): 0 DCR (Disable Correction): 0 Read Retry Count: 14 Correction Span: 28 Head Offset Count: 0 Data Strobe Offset Count: 0 Write Retry Count: 15 Recovery Time Limit: 0 # df Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/sd0a 47183 13098 30311 30% / /dev/sd0s1f 1822738 504147 1172772 30% /usr /dev/sd0s1e 98479 1372 89229 2% /var procfs 4 4 0 100% /proc amd:96 0 0 0 100% /fs