From owner-freebsd-bugs Fri Jan 19 15:40:23 2001 Delivered-To: freebsd-bugs@hub.freebsd.org Received: from freefall.freebsd.org (freefall.FreeBSD.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id E0C0437B698 for ; Fri, 19 Jan 2001 15:40:01 -0800 (PST) Received: (from gnats@localhost) by freefall.freebsd.org (8.11.1/8.11.1) id f0JNe1D90896; Fri, 19 Jan 2001 15:40:01 -0800 (PST) (envelope-from gnats) Received: from postal.incyte.com (postal.incyte.com [198.31.37.2]) by hub.freebsd.org (Postfix) with ESMTP id 62CD137B400 for ; Fri, 19 Jan 2001 15:36:28 -0800 (PST) Received: from blah.incyte.com (blah.incyte.com [10.99.1.40]) by postal.incyte.com (8.11.1/8.11.1) with ESMTP id f0JNaNf02061; Fri, 19 Jan 2001 15:36:23 -0800 (PST) Received: from blah.incyte.com (bl@localhost) by blah.incyte.com (8.9.1a/8.9.1) with ESMTP id PAA158459; Fri, 19 Jan 2001 15:36:27 -0800 (PST) Message-Id: <200101192336.PAA158459@blah.incyte.com> Date: Fri, 19 Jan 2001 15:36:27 -0800 From: "Brett G. Lemoine" Reply-To: bl@incyte.com To: FreeBSD-gnats-submit@freebsd.org Cc: bl@incyte.com X-Send-Pr-Version: 3.2 Subject: i386/24469: system hangs on scsi disk access error Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >Number: 24469 >Category: i386 >Synopsis: system hangs on scsi disk access error >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri Jan 19 15:40:01 PST 2001 >Closed-Date: >Last-Modified: >Originator: Brett G Lemoine >Release: FreeBSD 4.2-RELEASE i386 >Organization: Incyte Genomics, Inc >Environment: TYAN Thunderbolt S1837 motherboard w/ onboard Adaptec AIC-7896 dual channel Ultra2 LVD SCSI FreeBSD 4.2-RELEASE #1: Fri Jan 12 19:52:23 CST 2001 root@blur.unixshaman.com:/usr/src/sys/compile/SHAMAN Timecounter "i8254" frequency 1193182 Hz CPU: Pentium III/Pentium III Xeon/Celeron (751.71-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x681 Stepping = 1 Features=0x383fbff real memory = 1073741824 (1048576K bytes) config> di aha0 config> q avail memory = 1042231296 (1017804K bytes) Programming 24 pins in IOAPIC #0 IOAPIC #0 intpin 2 -> irq 0 FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfee00000 cpu1 (AP): apic id: 1, version: 0x00040011, at 0xfee00000 io0 (APIC): apic id: 2, version: 0x00170011, at 0xfec00000 Preloaded elf kernel "kernel" at 0xc0392000. Preloaded userconfig_script "/boot/kernel.conf" at 0xc039209c. Pentium Pro MTRR support enabled md0: Malloc disk npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard pci0: on pcib0 pcib2: at device 1.0 on pci0 pci1: on pcib2 pcib3: at device 1.0 on pci1 pci2: on pcib3 pci2: at 1.0 pci2: at 2.0 pci2: at 3.0 pci2: at 4.0 isab0: at device 7.0 on pci0 isa0: on isab0 atapci0: port 0xffa0-0xffaf at device 7.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 uhci0: at device 7.2 on pci0 uhci0: Invalid irq 255 uhci0: Please switch on USB support and switch PNP-OS to 'No' in BIOS device_probe_and_attach: uhci0 attach returned 6 Timecounter "PIIX" frequency 3579545 Hz chip1: port 0x440-0x44f at device 7.3 on pci0 ahc0: port 0xe400-0xe4ff mem 0xfebfe000-0xfebfefff irq 16 at device 11.0 on pci0 aic7896/97: Wide Channel A, SCSI Id=7, 32/255 SCBs ahc1: port 0xe800-0xe8ff mem 0xfebff000-0xfebfffff irq 16 at device 11.1 on pci0 aic7896/97: Wide Channel B, SCSI Id=7, 32/255 SCBs pcm0: port 0xef00-0xef3f irq 18 at device 12.0 on pci0 fxp0: port 0xee80-0xeebf mem 0xfea00000-0xfeafffff,0xfebfd000-0xfebfdfff irq 19 at device 13.0 on pci0 fxp0: Ethernet address 00:e0:81:10:c9:0e fxp1: port 0xed80-0xedbf mem 0xfe800000-0xfe8fffff,0xfebfc000-0xfebfcfff irq 17 at device 17.0 on pci0 fxp1: Ethernet address 00:d0:b7:73:39:03 pcib1: on motherboard pci3: on pcib1 fdc0: at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 atkbdc0: at port 0x60,0x64 on isa0 atkbd0: flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 psm0: irq 12 on atkbdc0 psm0: model IntelliMouse, device ID 3 vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A ppc0: at port 0x378-0x37f irq 7 on isa0 ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode ppi0: on ppbus0 plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port APIC_IO: Testing 8254 interrupt delivery APIC_IO: routing 8254 via IOAPIC #0 intpin 2 SMP: AP CPU #1 Launched! acd0: CDROM at ata1-master using PIO4 Waiting 5 seconds for SCSI devices to settle Mounting root from ufs:/dev/da0s1a da0 at ahc0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-3 device da0: 80.000MB/s transfers (40.000MHz, offset 63, 16bit), Tagged Queueing Enabled da0: 17501MB (35843670 512 byte sectors: 255H 63S/T 2231C) da1 at ahc0 bus 0 target 1 lun 0 da1: Fixed Direct Access SCSI-3 device da1: 80.000MB/s transfers (40.000MHz, offset 63, 16bit), Tagged Queueing Enabled da1: 17501MB (35843670 512 byte sectors: 255H 63S/T 2231C) WARNING: / was not properly dismounted cd0 at ahc1 bus 0 target 5 lun 0 cd0: Removable CD-ROM SCSI-2 device cd0: 20.000MB/s transfers (20.000MHz, offset 15) cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed da2 at ahc1 bus 0 target 6 lun 0 da2: Removable Direct Access SCSI-2 device da2: 20.000MB/s transfers (20.000MHz, offset 15) da2: Attempt to query device size failed: NOT READY, Medium not present pid 217 (Xaccel): trap 12 with interrupts disabled pid 217 (Xaccel): trap 7 with interrupts disabled cd9660: RockRidge Extension >Description: Sporadically (5 times in the last two weeks, including 3 times on one day), I get the below errors on one of my two disks. (da1:ahc0:0:1:0): SCB 0x1d - timed out while idle, SEQADDR == 0x5 STACK == 0x13, 0x174, 0x15e, 0x174 SXFRCTL0 == 0x80 SCB count = 110 QINFIFO entries: 34 18 46 1 19 31 52 20 33 9 3 67 57 45 0 30 54 22 50 40 23 8 36 2 32 44 35 5 17 11 28 10 101 15 51 26 6 Waiting Queue entries: 11:66 Disconnected Queue entries: 17:39 27:29 QOUTFIFO entries: Sequencer Free SCB List: 20 2 0 28 14 10 29 31 15 24 7 19 6 23 18 21 12 26 13 22 4 30 9 3 16 8 25 1 5 Pending list: 6 26 51 15 101 10 28 11 17 5 35 44 32 2 36 8 23 40 50 22 54 30 0 45 57 67 3 9 33 20 52 31 19 1 46 18 34 66 39 29 Kernel Free SCB list: 24 58 25 47 59 55 27 42 4 49 3 8 37 43 21 41 53 48 16 12 69 56 68 13 83 14 82 81 80 99 98 97 96 95 94 93 92 91 90 109 108 107 106 105 104 103 102 65 84 85 86 87 88 89 70 71 72 73 74 75 76 77 78 79 60 61 62 63 64 100 sg[0] - Addr 0x1a608800 : Length 1024 (da1:ahc0:0:1:0): SCB 29: Immediate reset. Flags = 0x4040 (da1:ahc0:0:1:0): no longer in timeout, status = 34b ahc0: Issued Channel A Bus Reset. 40 SCBs aborted After looking for similar problems in the GNATs database, I saw suggestions to disable tagged queueing, which I then did on both disks (using camcontrol). I then didn't see the problem for a while, so I thought that it had been taken care of, but today, I get the following: (da0:ahc0:0:0:0): SCB 0x8 - timed out while idle, SEQADDR == 0x3e STACK == 0x1, 0x1, 0x1, 0x1 SXFRCTL0 == 0x80 SCB count = 20 QINFIFO entries: 8 14 Waiting Queue entries: Disconnected Queue entrties: QOUTFIFO entries: Sequencer Free SCB List: 1 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Pending list: 14 8 Kernel Free SCB list: 15 16 17 18 18 0 1 2 3 4 5 6 7 13 12 11 10 Untagged Q(0): 8 Untagged Q(1): 14 sg[0] - Addr 0x3c381000 : Length 4096 sg[1] - Addr 0x35ce2000 : Length 2048 (da0:ahc0:0:0:0): SCB 8: Immediate reset. Flags = 0x6040 (da0:ahc0:0:0:0): no longer in timeout, status = 34b ahc0: Issued Channel A Bus Reset. 2 SCBs aborted (da0:ahc0:0:0:0): SCB 0x9 - timed out while idle, SEQADDR == 0x3e STACK == 0x1, 0x1, 0x1, 0x1 SXFRCTL0 == 0x80 SCB count = 20 QINFIFO entries: 9 14 Waiting Queue entries: Disconnected Queue entrties: QOUTFIFO entries: Sequencer Free SCB List: 1 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Pending list: 14 9 Kernel Free SCB list: 15 16 17 18 18 0 1 2 3 4 5 6 7 13 12 11 10 Untagged Q(0): 9 Untagged Q(1): 14 sg[0] - Addr 0x3c381000 : Length 4096 sg[1] - Addr 0x35ce2000 : Length 2048 (da0:ahc0:0:0:0): SCB 8: Immediate reset. Flags = 0x6040 (da0:ahc0:0:0:0): no longer in timeout, status = 34b ahc0: Issued Channel A Bus Reset. 2 SCBs aborted I'm somewhat new to PC-type hardware, so this may be nothing, but are the two channels on the ahc's _supposed_ to have the same IRQ? I couldn't find a way to alter either ahc's IRQ from either the system or scsi bios, so I'm assuming they're setup correctly. Given that there was no activity on the other bus (nothing in either the cd-writer or zip drive) at the time of the problems, I don't believe it's likely to be simply an IRQ issue. >How-To-Repeat: The problems seem to occur most frequenly when there's heavy disk activity, but I can't seem to reproduce it on demand. >Fix: >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message