Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 19 Apr 1998 12:00:22 -0700 (PDT)
From:      rtm@viaweb.com
To:        freebsd-gnats-submit@FreeBSD.ORG
Subject:   kern/6351: DPT RAID controller stops working under heavy load.
Message-ID:  <199804191900.MAA19047@hub.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         6351
>Category:       kern
>Synopsis:       DPT RAID controller stops working under heavy load.
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:
>Keywords:
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Apr 19 12:10:01 PDT 1998
>Last-Modified:
>Originator:     Robert Morris
>Organization:
Viaweb
>Release:        2.2.6
>Environment:
FreeBSD 2.2.6-RELEASE #8: Sat Apr 18 12:08:03 EDT 1998
    rtm@bab-el-ehr.viaweb.com:/c2/rtm/sys-2.2.6/compile/DPT
CPU: Pentium Pro (331.92-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x650  Stepping=0
  Features=0x183fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,<b16>,<b17>,MMX,<b24>>
real memory  = 134217728 (131072K bytes)
avail memory = 119980032 (117168K bytes)
DPT:  RAID Manager driver, Version 1.0.1
Probing for devices on PCI bus 0:
chip0 <generic PCI bridge (vendor=8086 device=7180 subclass=0)> rev 3 on pci0:0:0
chip1 <generic PCI bridge (vendor=8086 device=7181 subclass=4)> rev 3 on pci0:1:0
chip2 <Intel 82371AB PCI-ISA bridge> rev 1 on pci0:7:0
chip3 <Intel 82371AB IDE interface> rev 1 on pci0:7:1
chip4 <Intel 82371AB USB interface> rev 1 int d irq ?? on pci0:7:2
chip5 <Intel 82371AB Power management controller> rev 1 on pci0:7:3
vga0 <VGA-compatible display device> rev 0 int a irq 11 on pci0:15:0
fxp0 <Intel EtherExpress Pro 10/100B Ethernet> rev 4 int a irq 9 on pci0:16:0
fxp0: Ethernet address 00:a0:c9:b0:14:5c
DPT:  PCI SCSI HBA Driver, version 1.2.4
dpt0 <DPT Caching SCSI RAID Controller> rev 2 int a irq 10 on pci0:18:0
dpt0: DPT type 3, model PM3334UW firmware 07L0, Protocol 0 
      on port ef90 with 458753MB Write-Back cache.  LED = 0000 0000 
dpt0: Enabled Options:
      Verify Lost Transactions
      Precisely Track State Transitions
      Collect Metrics
      Handle Timeouts
(dpt0:0:0): "DPT RAID-5 07L0" type 0 fixed SCSI 2
sd0(dpt0:0:0): Direct-Access 34731MB (71130368 512 byte sectors)
(dpt0:6:0): "HP C1537A L708" type 1 removable SCSI 2
st0(dpt0:6:0): Sequential-Access density code 0x25, variable blocks, write-enabled
Probing for devices on PCI bus 1:
Probing for devices on the ISA bus:
sc0 at 0x60-0x6f irq 1 on motherboard
sc0: VGA color <16 virtual consoles, flags=0x0>
ed0 not found at 0x300
sio0 at 0x3f8-0x3ff irq 4 on isa
sio0: type 16550A
sio1 at 0x2f8-0x2ff irq 3 on isa
sio1: type 16550A
sio4 not found at 0x2f0
sio5 not found at 0x3e0
lpt0 at 0x378-0x37f irq 7 on isa
lpt0: Interrupt-driven port
lp0: TCP/IP capable interface
psm0 not found at 0x60
fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
fdc0: FIFO enabled, 8 bytes threshold
fd0: 1.44MB 3.5in
wdc0 at 0x1f0-0x1f7 irq 14 on isa
wdc0: unit 0 (wd0): <WDC AC22100H>
wd0: 2014MB (4124736 sectors), 4092 cyls, 16 heads, 63 S/T, 512 B/S
wdc1 not found at 0x170
bt0 not found at 0x330
ep0 not found at 0x300
npx0 flags 0x1 on motherboard
npx0: INT 16 interface
changing root device to wd0s1a

>Description:
I have a DPT PM3334UW with two busses, three Seagate ST39173W
drives, two Seagate ST19171W drives, all in a RAID-5 Array.
Under heavy load the DPT driver or board stops completing requests.
The DPT Busy LED stays on permanently, and the Write LED blinks
once per second. Here's the dmesg output:

dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 
            as late after 13159566usec
dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 
            as late after 13158190usec
dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0 
            as late after 13157239usec
dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0 
            as late after 13109499usec
dpt0 ERROR: Stale 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (23159566)
		gets another chance(1/5)
dpt0 ERROR: Stale 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (33157546)
		gets another chance(1/5)
dpt0 ERROR: Stale 91997 (Test Unit Ready [7.24]) on c0b0t0u0 (43156142)
		gets another chance(1/5)
dpt0 ERROR: Stale 92003 (Test Unit Ready [7.24]) on c0b0t0u0 (53107731)
		gets another chance(1/5)
dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 
            as late after 63159566usec
dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 
            as late after 63158292usec
dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0 
            as late after 63157343usec
dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0 
            as late after 63109602usec
dpt0 ERROR: Stale 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (73159696)
		gets another chance(2/5)
dpt0 ERROR: Stale 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (83157548)
		gets another chance(2/5)
dpt0 ERROR: Stale 91997 (Test Unit Ready [7.24]) on c0b0t0u0 (93156014)
		gets another chance(2/5)
dpt0 ERROR: Stale 92003 (Test Unit Ready [7.24]) on c0b0t0u0 (103107728)
		gets another chance(2/5)
dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 
            as late after 113159565usec
dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 
            as late after 113158162usec
dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0 
            as late after 113157214usec
dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0 
            as late after 113109475usec
dpt0 ERROR: Stale 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (123159566)
		gets another chance(3/5)
dpt0 ERROR: Stale 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (133157554)
		gets another chance(3/5)
dpt0 ERROR: Stale 91997 (Test Unit Ready [7.24]) on c0b0t0u0 (143156017)
		gets another chance(3/5)
dpt0 ERROR: Stale 92003 (Test Unit Ready [7.24]) on c0b0t0u0 (153108177)
		gets another chance(3/5)
dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 
            as late after 163159569usec
dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 
            as late after 163158330usec
dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0 
            as late after 163157380usec
dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0 
            as late after 163109644usec
dpt0 ERROR: Stale 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (173159565)
		gets another chance(4/5)
dpt0 ERROR: Stale 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (183157548)
		gets another chance(4/5)
dpt0 ERROR: Stale 91997 (Test Unit Ready [7.24]) on c0b0t0u0 (193156018)
		gets another chance(4/5)
dpt0 ERROR: Stale 92003 (Test Unit Ready [7.24]) on c0b0t0u0 (203107728)
		gets another chance(4/5)
dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 
            as late after 213159566usec
dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 
            as late after 213158186usec
dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0 
            as late after 213157231usec
dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0 
            as late after 213109492usec
dpt0 ERROR: Stale 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (223159567)
		gets another chance(5/5)
dpt0 ERROR: Stale 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (233157547)
		gets another chance(5/5)
dpt0 ERROR: Stale 91997 (Test Unit Ready [7.24]) on c0b0t0u0 (243156016)
		gets another chance(5/5)
dpt0 ERROR: Stale 92003 (Test Unit Ready [7.24]) on c0b0t0u0 (253107729)
		gets another chance(5/5)
dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 
            as late after 263159570usec
dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 
            as late after 263158162usec
dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0 
            as late after 263157216usec
dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0 
            as late after 263109478usec
dpt0 ERROR: Stale 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (273159565)
		gets another chance(6/5)
dpt0 ERROR: Stale 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 (283157548)
		gets another chance(6/5)
dpt0 ERROR: Stale 91997 (Test Unit Ready [7.24]) on c0b0t0u0 (293156014)
		gets another chance(6/5)
dpt0 ERROR: Stale 92003 (Test Unit Ready [7.24]) on c0b0t0u0 (303107729)
		gets another chance(6/5)
dpt0 ERROR: Marking 91995 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 
            as late after 313159568usec
dpt0 ERROR: Marking 91996 (Prevent/Allow Medium Removal [7.14]) on c0b0t0u0 
            as late after 313158207usec
dpt0 ERROR: Marking 91997 (Test Unit Ready [7.24]) on c0b0t0u0 
            as late after 313157257usec
dpt0 ERROR: Marking 92003 (Test Unit Ready [7.24]) on c0b0t0u0 
            as late after 313109518usec
dpt0 ERROR: Destroying stale 91995 (Prevent/Allow Medium Removal [7.14])
		on c0b0t0u0 (323159567/7)
dpt0 ERROR: Destroying stale 91996 (Prevent/Allow Medium Removal [7.14])
		on c0b0t0u0 (333157547/7)
dpt0 ERROR: Destroying stale 91997 (Test Unit Ready [7.24])
		on c0b0t0u0 (343156017/7)
dpt0 ERROR: Destroying stale 92003 (Test Unit Ready [7.24])
		on c0b0t0u0 (353107733/7)

>How-To-Repeat:
The problem shows up in a few minutes if I run 39 processes
that read random blocks from the raw disk, and at the same time
one process that repeatedly truncates a file and writes 200 MB
to it.
>Fix:
I don't know how to fix it. If I turn off Tagged Command Queuing
using the DPT's ^D boot rom software, the problem takes longer to
show up (ie after 10 minutes of heavy load rather than just 1).

>Audit-Trail:
>Unformatted:

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199804191900.MAA19047>