Date: Tue, 2 Jan 2007 07:36:08 -0800 From: Jeremy Chadwick <koitsu@FreeBSD.org> To: freebsd-stable@FreeBSD.org Subject: Interrupt (SCSI?) hang on 4.x Message-ID: <20070102153608.GA78405@icarus.home.lan>
next in thread | raw e-mail | index | archive | help
Yes, I know 4.11 is EOL'd at the end of this month, but hopefully someone can shed some light on this problem anyways. I simply don't have the knowledge of what's going on on a low-level to determine the cause. I do have serial console on this box, and after enabling some debugging for the ahc(4) driver a few months back, was able to get something intelligent out of the system regarding SCBs this morning. This may not be useful (or the cause), though. I also cannot enable drop-to-DDB-on-serial-break because our Portmaster 2 has been known to send a serial break on rare occasion. :-( Every so often (sometimes hours, sometimes months -- usually months), the 4.11 box we have "locks up" in the sense that both NICs on the box stop working, and the SCSI controller also appears hung. This problem has existed for a couple years; it's not specific to 4.11 (versus 4.10 or 4.9). I have to hard reset or power cycle the box to get it working again. The problem will continue indefinitely until the machine is reset; meaning it does not recover on its own. Naturally this means quite an ugly fsck when the machine comes back up. The initial symptoms are: fxp0: device timeout fxp1: device timeout ahc0: Timedout SCBs already complete. Interrupts may not be functioning. Hardware: * 2x Pentium III 933MHz * Tyan Tiger 200 - VIA NB/SB chipset (probably the cause of all this :) ) - Mainboard running latest BIOS - 2x Intel 82559 on-board NIC * 512MB RAM (ECC; has built world for years no problem) * Adaptec 29160 U160 controller (physical card, not on-board); not sure what Adaptec BIOS revision (anyway to check via FreeBSD?) * Hard disk is a single 16GB U160/SCSI-3 drive * Kernel is SMP Devices and associated IRQs: fxp0: <Intel 82559 Pro/100 Ethernet> port 0xe000-0xe03f mem 0xd6000000-0xd60fffff,0xd6202000-0xd6202fff irq 10 at device 13.0 on pci0 fxp1: <Intel 82559 Pro/100 Ethernet> port 0xe400-0xe43f mem 0xd6100000-0xd61fffff,0xd6201000-0xd6201fff irq 11 at device 14.0 on pci0 ahc0: <Adaptec 29160B Ultra160 SCSI adapter> port 0xe800-0xe8ff mem 0xd6203000-0xd6203fff irq 11 at device 16.0 on pci0 da0: <IBM DDYS-T18350N S96H> Fixed Direct Access SCSI-3 device # vmstat -i ata0 irq14 6 0 fxp0 irq10 14874 28 mux irq11 65028 125 fdc0 irq6 1 0 sio0 irq4 948 1 clk irq0 516187 998 rtc irq8 66071 127 Total 663115 1282 # pciconf -l agp0@pci0:0:0: class=0x060000 card=0x00000000 chip=0x06911106 rev=0xc4 hdr=0x00 pcib2@pci0:1:0: class=0x060400 card=0x00000080 chip=0x85981106 rev=0x00 hdr=0x01 none0@pci0:6:0: class=0x030000 card=0x00081002 chip=0x47521002 rev=0x27 hdr=0x00 isab0@pci0:7:0: class=0x060100 card=0x00001106 chip=0x06861106 rev=0x40 hdr=0x00 atapci0@pci0:7:1: class=0x01018a card=0x05711106 chip=0x05711106 rev=0x06 hdr=0x00 viapropm0@pci0:7:4: class=0x060000 card=0x00000000 chip=0x30571106 rev=0x40 hdr=0x00 fxp0@pci0:13:0: class=0x020000 card=0x000c8086 chip=0x12298086 rev=0x08 hdr=0x00 fxp1@pci0:14:0: class=0x020000 card=0x000c8086 chip=0x12298086 rev=0x08 hdr=0x00 ahc0@pci0:16:0: class=0x010000 card=0xe2209005 chip=0x00809005 rev=0x02 hdr=0x00 I can include my kernel configuration if need be, but it's fairly standard. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | === SNIP === FreeBSD/i386 (pentarou.parodius.com) (ttyd0) login: fxp0: device timeout ahc0: Recovery Initiated >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< ahc0: Dumping Card State in Message-out phase, at SEQADDR 0x16c Card was paused ACCUM = 0xa0, SINDEX = 0x61, DINDEX = 0xe4, ARG_2 = 0x10 HCNT = 0x0 SCBPTR = 0x1f SCSIPHASE[0x4] SCSISIGI[0xb6] ERROR[0x0] SCSIBUSL[0x41] LASTPHASE[0xa0] SCSISEQ[0x12] SBLKCTL[0xa] SCSIRATE[0x0] SEQCTL[0x10] SEQ_FLAGS[0x40] SSTAT0[0x2] SSTAT1[0x1] SSTAT2[0x0] SSTAT3[0x0] SIMODE0[0x8] SIMODE1[0xac] SXFRCTL0[0x88] DFCNTRL[0x0] DFSTATUS[0x89] STACK: 0xe2 0xe2 0xe2 0x179 SCB count = 130 Kernel NEXTQSCB = 98 Card NEXTQSCB = 124 QINFIFO entries: 124 76 86 37 106 30 87 80 59 104 127 110 22 Waiting Queue entries: Disconnected Queue entries: QOUTFIFO entries: Sequencer Free SCB List: 24 10 4 12 17 22 15 5 7 28 8 18 30 6 23 26 14 21 1 19 27 29 11 25 3 2 9 13 20 16 0 Sequencer SCB Info: 0 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 1 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 2 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 3 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 4 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 5 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 6 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 7 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 8 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 9 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 10 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 11 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 12 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 13 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 14 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 15 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 16 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 17 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 18 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 19 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 20 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 21 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 22 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 23 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 24 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 25 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 26 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 27 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 28 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 29 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 30 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 31 SCB_CONTROL[0x60] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0x41] Pending list: 22 SCB_CONTROL[0x72] SCB_SCSIID[0x7] SCB_LUN[0x0] 110 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 127 SCB_CONTROL[0x72] SCB_SCSIID[0x7] SCB_LUN[0x0] 104 SCB_CONTROL[0x72] SCB_SCSIID[0x7] SCB_LUN[0x0] 59 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 80 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 87 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 30 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 106 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 37 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 86 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 76 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 124 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 65 SCB_CONTROL[0x60] SCB_SCSIID[0x7] SCB_LUN[0x0] Kernel Free SCB list: 5 118 81 94 27 97 14 56 1 115 93 128 33 41 31 36 12 54 64 79 4 55 63 107 70 119 15 39 77 69 66 17 67 95 58 16 75 100 53 29 47 125 60 111 71 10 129 114 82 25 35 99 117 83 44 38 123 92 74 126 90 85 50 46 32 68 45 21 48 102 96 57 42 89 43 78 109 62 23 72 0 116 120 2 11 105 20 103 52 101 26 24 121 51 122 40 112 18 34 84 73 13 7 91 28 108 19 9 8 88 3 61 49 6 113 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> (da0:ahc0:0:0:0): SCB 0x16 - timed out sg[0] - Addr 0x6b39000 : Length 4096 sg[1] - Addr 0x6bba000 : Length 4096 (da0:ahc0:0:0:0): Other SCB Timeout (da0:ahc0:0:0:0): SCB 0x6e - timed out sg[0] - Addr 0x9f79000 : Length 4096 sg[1] - Addr 0x1b63a000 : Length 2560 (da0:ahc0:0:0:0): Other SCB Timeout (da0:ahc0:0:0:0): SCB 0x7f - timed out sg[0] - Addr 0x64e7000 : Length 4096 sg[1] - Addr 0x64c8000 : Length 4096 (da0:ahc0:0:0:0): Other SCB Timeout (da0:ahc0:0:0:0): SCB 0x68 - timed out sg[0] - Addr 0x88e6000 : Length 4096 (da0:ahc0:0:0:0): Other SCB Timeout (da0:ahc0:0:0:0): SCB 0x3b - timed out sg[0] - Addr 0x5e05000 : Length 4096 sg[1] - Addr 0x2fa6000 : Length 2048 (da0:ahc0:0:0:0): Other SCB Timeout (da0:ahc0:0:0:0): SCB 0x50 - timed out sg[0] - Addr 0xef5b000 : Length 4096 sg[1] - Addr 0xf05c000 : Length 4096 (da0:ahc0:0:0:0): Other SCB Timeout (da0:ahc0:0:0:0): SCB 0x57 - timed out sg[0] - Addr 0x14c7000 : Length 4096 sg[1] - Addr 0x111a8000 : Length 4096 (da0:ahc0:0:0:0): Other SCB Timeout (da0:ahc0:0:0:0): SCB 0x1e - timed out sg[0] - Addr 0x19de9000 : Length 4096 sg[1] - Addr 0x1cdaa000 : Length 4096 (da0:ahc0:0:0:0): Other SCB Timeout (da0:ahc0:0:0:0): SCB 0x6a - timed out sg[0] - Addr 0x725d000 : Length 4096 sg[1] - Addr 0x709e000 : Length 4096 (da0:ahc0:0:0:0): Other SCB Timeout (da0:ahc0:0:0:0): SCB 0x25 - timed out sg[0] - Addr 0x1db1b000 : Length 4096 sg[1] - Addr 0xf95c000 : Length 4096 (da0:ahc0:0:0:0): Other SCB Timeout (da0:ahc0:0:0:0): SCB 0x56 - timed out sg[0] - Addr 0x91b1000 : Length 4096 sg[1] - Addr 0x94f2000 : Length 4096 (da0:ahc0:0:0:0): Other SCB Timeout (da0:ahc0:0:0:0): SCB 0x4c - timed out sg[0] - Addr 0x7087000 : Length 4096 sg[1] - Addr 0x6f48000 : Length 4096 (da0:ahc0:0:0:0): Other SCB Timeout (da0:ahc0:0:0:0): SCB 0x7c - timed out sg[0] - Addr 0x63b1000 : Length 4096 sg[1] - Addr 0x66b2000 : Length 4096 (da0:ahc0:0:0:0): Other SCB Timeout (da0:ahc0:0:0:0): SCB 0x41 - timed out sg[0] - Addr 0x62cd000 : Length 4096 sg[1] - Addr 0x656e000 : Length 4096 (da0:ahc0:0:0:0): BDR message in message buffer ahc0: Timedout SCBs already complete. Interrupts may not be functioning. ahc0: Recovery Initiated >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<< ahc0: Dumping Card State in Message-out phase, at SEQADDR 0x16c Card was paused ACCUM = 0xa0, SINDEX = 0x61, DINDEX = 0xe4, ARG_2 = 0x10 HCNT = 0x0 SCBPTR = 0x1f SCSIPHASE[0x4] SCSISIGI[0xb6] ERROR[0x0] SCSIBUSL[0x6] LASTPHASE[0xa0] SCSISEQ[0x12] SBLKCTL[0xa] SCSIRATE[0x0] SEQCTL[0x10] SEQ_FLAGS[0x40] SSTAT0[0x2] SSTAT1[0x1] SSTAT2[0x0] SSTAT3[0x0] SIMODE0[0x8] SIMODE1[0xac] SXFRCTL0[0x88] DFCNTRL[0x0] DFSTATUS[0x89] STACK: 0xe2 0xe2 0xe2 0x179 SCB count = 130 Kernel NEXTQSCB = 98 Card NEXTQSCB = 124 QINFIFO entries: 124 76 86 37 106 30 87 80 59 104 127 110 22 Waiting Queue entries: Disconnected Queue entries: QOUTFIFO entries: Sequencer Free SCB List: 24 10 4 12 17 22 15 5 7 28 8 18 30 6 23 26 14 21 1 19 27 29 11 25 3 2 9 13 20 16 0 Sequencer SCB Info: 0 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 1 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 2 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 3 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 4 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 5 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 6 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 7 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 8 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 9 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 10 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 11 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 12 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 13 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 14 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 15 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 16 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 17 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 18 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 19 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 20 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 21 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 22 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 23 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 24 SCB_CONTROL[0x0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 25 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 26 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 27 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 28 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 29 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 30 SCB_CONTROL[0xe0] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0xff] 31 SCB_CONTROL[0x60] SCB_SCSIID[0x7] SCB_LUN[0x0] SCB_TAG[0x41] Pending list: 22 SCB_CONTROL[0x72] SCB_SCSIID[0x7] SCB_LUN[0x0] 110 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 127 SCB_CONTROL[0x72] SCB_SCSIID[0x7] SCB_LUN[0x0] 104 SCB_CONTROL[0x72] SCB_SCSIID[0x7] SCB_LUN[0x0] 59 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 80 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 87 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 30 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 106 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 37 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 86 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 76 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 124 SCB_CONTROL[0x70] SCB_SCSIID[0x7] SCB_LUN[0x0] 65 SCB_CONTROL[0x60] SCB_SCSIID[0x7] SCB_LUN[0x0] Kernel Free SCB list: 5 118 81 94 27 97 14 56 1 115 93 128 33 41 31 36 12 54 64 79 4 55 63 107 70 119 15 39 77 69 66 17 67 95 58 16 75 100 53 29 47 125 60 111 71 10 129 114 82 25 35 99 117 83 44 38 123 92 74 126 90 85 50 46 32 68 45 21 48 102 96 57 42 89 43 78 109 62 23 72 0 116 120 2 11 105 20 103 52 101 26 24 121 51 122 40 112 18 34 84 73 13 7 91 28 108 19 9 8 88 3 61 49 6 113 <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>> (da0:ahc0:0:0:0): SCB 0x41 - timed out sg[0] - Addr 0x62cd000 : Length 4096 sg[1] - Addr 0x656e000 : Length 4096 (da0:ahc0:0:0:0): no longer in timeout, status = 34b ahc0: Issued Channel A Bus Reset. 14 SCBs aborted ahc0: Timedout SCBs already complete. Interrupts may not be functioning. fxp1: device timeout
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070102153608.GA78405>