Date: Sun, 19 Apr 1998 12:00:22 -0700 (PDT) From: rtm@viaweb.com To: freebsd-gnats-submit@FreeBSD.ORG Subject: kern/6351: DPT RAID controller stops working under heavy load. Message-ID: <199804191900.MAA19047@hub.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 6351 >Category: kern >Synopsis: DPT RAID controller stops working under heavy load. >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun Apr 19 12:10:01 PDT 1998 >Last-Modified: >Originator: Robert Morris >Organization: Viaweb >Release: 2.2.6 >Environment: FreeBSD 2.2.6-RELEASE #8: Sat Apr 18 12:08:03 EDT 1998 rtm@bab-el-ehr.viaweb.com:/c2/rtm/sys-2.2.6/compile/DPT CPU: Pentium Pro (331.92-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x650 Stepping=0 Features=0x183fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,<b16>,<b17>,MMX,<b24>> real memory = 134217728 (131072K bytes) avail memory = 119980032 (117168K bytes) DPT: RAID Manager driver, Version 1.0.1 Probing for devices on PCI bus 0: chip0 <generic PCI bridge (vendor=8086 device=7180 subclass=0)> rev 3 on pci0:0:0 chip1 <generic PCI bridge (vendor=8086 device=7181 subclass=4)> rev 3 on pci0:1:0 chip2 <Intel 82371AB PCI-ISA bridge> rev 1 on pci0:7:0 chip3 <Intel 82371AB IDE interface> rev 1 on pci0:7:1 chip4 <Intel 82371AB USB interface> rev 1 int d irq ?? on pci0:7:2 chip5 <Intel 82371AB Power management controller> rev 1 on pci0:7:3 vga0 <VGA-compatible display device> rev 0 int a irq 11 on pci0:15:0 fxp0 <Intel EtherExpress Pro 10/100B Ethernet> rev 4 int a irq 9 on pci0:16:0 fxp0: Ethernet address 00:a0:c9:b0:14:5c DPT: PCI SCSI HBA Driver, version 1.2.4 dpt0 <DPT Caching SCSI RAID Controller> rev 2 int a irq 10 on pci0:18:0 dpt0: DPT type 3, model PM3334UW firmware 07L0, Protocol 0 on port ef90 with 458753MB Write-Back cache. LED = 0000 0000 dpt0: Enabled Options: Verify Lost Transactions Precisely Track State Transitions Collect Metrics Handle Timeouts (dpt0:0:0): "DPT RAID-5 07L0" type 0 fixed SCSI 2 sd0(dpt0:0:0): Direct-Access 34731MB (71130368 512 byte sectors) (dpt0:6:0): "HP C1537A L708" type 1 removable SCSI 2 st0(dpt0:6:0): Sequential-Access density code 0x25, variable blocks, write-enabled Probing for devices on PCI bus 1: Probing for devices on the ISA bus: sc0 at 0x60-0x6f irq 1 on motherboard sc0: VGA color <16 virtual consoles, flags=0x0> ed0 not found at 0x300 sio0 at 0x3f8-0x3ff irq 4 on isa sio0: type 16550A sio1 at 0x2f8-0x2ff irq 3 on isa sio1: type 16550A sio4 not found at 0x2f0 sio5 not found at 0x3e0 lpt0 at 0x378-0x37f irq 7 on isa lpt0: Interrupt-driven port lp0: TCP/IP capable interface psm0 not found at 0x60 fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa fdc0: FIFO enabled, 8 bytes threshold fd0: 1.44MB 3.5in wdc0 at 0x1f0-0x1f7 irq 14 on isa wdc0: unit 0 (wd0): <WDC AC22100H> wd0: 2014MB (4124736 sectors), 4092 cyls, 16 heads, 63 S/T, 512 B/S wdc1 not found at 0x170 bt0 not found at 0x330 ep0 not found at 0x300 npx0 flags 0x1 on motherboard npx0: INT 16 interface changing root device to wd0s1a >Description: I have a DPT PM3334UW with two busses, three Seagate ST39173W drives, two Seagate ST19171W drives, all in a RAID-5 Array. Under heavy load the DPT driver or board stops completing requests. The DPT Busy LED stays on permanently, and the Write LED blinks once per second. Here's the dmesg output: dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 as late after 13159566usec dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 as late after 13158190usec dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0 as late after 13157239usec dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0 as late after 13109499usec dpt0 ERROR: Stale 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (23159566) gets another chance(1/5) dpt0 ERROR: Stale 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (33157546) gets another chance(1/5) dpt0 ERROR: Stale 91997 (Test Unit Ready [7.24]) on c0b0t0u0 (43156142) gets another chance(1/5) dpt0 ERROR: Stale 92003 (Test Unit Ready [7.24]) on c0b0t0u0 (53107731) gets another chance(1/5) dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 as late after 63159566usec dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 as late after 63158292usec dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0 as late after 63157343usec dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0 as late after 63109602usec dpt0 ERROR: Stale 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (73159696) gets another chance(2/5) dpt0 ERROR: Stale 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (83157548) gets another chance(2/5) dpt0 ERROR: Stale 91997 (Test Unit Ready [7.24]) on c0b0t0u0 (93156014) gets another chance(2/5) dpt0 ERROR: Stale 92003 (Test Unit Ready [7.24]) on c0b0t0u0 (103107728) gets another chance(2/5) dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 as late after 113159565usec dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 as late after 113158162usec dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0 as late after 113157214usec dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0 as late after 113109475usec dpt0 ERROR: Stale 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (123159566) gets another chance(3/5) dpt0 ERROR: Stale 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (133157554) gets another chance(3/5) dpt0 ERROR: Stale 91997 (Test Unit Ready [7.24]) on c0b0t0u0 (143156017) gets another chance(3/5) dpt0 ERROR: Stale 92003 (Test Unit Ready [7.24]) on c0b0t0u0 (153108177) gets another chance(3/5) dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 as late after 163159569usec dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 as late after 163158330usec dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0 as late after 163157380usec dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0 as late after 163109644usec dpt0 ERROR: Stale 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (173159565) gets another chance(4/5) dpt0 ERROR: Stale 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (183157548) gets another chance(4/5) dpt0 ERROR: Stale 91997 (Test Unit Ready [7.24]) on c0b0t0u0 (193156018) gets another chance(4/5) dpt0 ERROR: Stale 92003 (Test Unit Ready [7.24]) on c0b0t0u0 (203107728) gets another chance(4/5) dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 as late after 213159566usec dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 as late after 213158186usec dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0 as late after 213157231usec dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0 as late after 213109492usec dpt0 ERROR: Stale 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (223159567) gets another chance(5/5) dpt0 ERROR: Stale 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (233157547) gets another chance(5/5) dpt0 ERROR: Stale 91997 (Test Unit Ready [7.24]) on c0b0t0u0 (243156016) gets another chance(5/5) dpt0 ERROR: Stale 92003 (Test Unit Ready [7.24]) on c0b0t0u0 (253107729) gets another chance(5/5) dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 as late after 263159570usec dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 as late after 263158162usec dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0 as late after 263157216usec dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0 as late after 263109478usec dpt0 ERROR: Stale 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (273159565) gets another chance(6/5) dpt0 ERROR: Stale 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (283157548) gets another chance(6/5) dpt0 ERROR: Stale 91997 (Test Unit Ready [7.24]) on c0b0t0u0 (293156014) gets another chance(6/5) dpt0 ERROR: Stale 92003 (Test Unit Ready [7.24]) on c0b0t0u0 (303107729) gets another chance(6/5) dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 as late after 313159568usec dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 as late after 313158207usec dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0 as late after 313157257usec dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0 as late after 313109518usec dpt0 ERROR: Destroying stale 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (323159567/7) dpt0 ERROR: Destroying stale 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (333157547/7) dpt0 ERROR: Destroying stale 91997 (Test Unit Ready [7.24]) on c0b0t0u0 (343156017/7) dpt0 ERROR: Destroying stale 92003 (Test Unit Ready [7.24]) on c0b0t0u0 (353107733/7) >How-To-Repeat: The problem shows up in a few minutes if I run 39 processes that read random blocks from the raw disk, and at the same time one process that repeatedly truncates a file and writes 200 MB to it. >Fix: I don't know how to fix it. If I turn off Tagged Command Queuing using the DPT's ^D boot rom software, the problem takes longer to show up (ie after 10 minutes of heavy load rather than just 1). >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199804191900.MAA19047>