Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Mar 2012 13:03:23 -0400 (EDT)
From:      Jonathan Stewart <jonathan@kc8onw.net>
To:        FreeBSD-gnats-submit@FreeBSD.org
Subject:   kern/165982: MPT instability drive resets and losses on FreeBSD 9-stable r232224
Message-ID:  <201203121703.q2CH3NeF002640@storage.kc8onw.net>
Resent-Message-ID: <201203121710.q2CHAAch011891@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         165982
>Category:       kern
>Synopsis:       MPT instability drive resets and losses on FreeBSD 9-stable r232224
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Mar 12 17:10:10 UTC 2012
>Closed-Date:
>Last-Modified:
>Originator:     Jonathan Stewart
>Release:        FreeBSD 9.0-STABLE amd64
>Organization:
>Environment:
System: FreeBSD storage.kc8onw.net 9.0-STABLE FreeBSD 9.0-STABLE #9 r232224: Thu Mar 1 14:07:11 EST 2012 root@storage.kc8onw.net:/usr/obj/usr/src/sys/STORAGE amd64

hostb0@pci0:0:0:0:      class=0x060000 card=0x062415d9 chip=0x01088086 rev=0x09 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Xeon E3-1200 Processor Family DRAM Controller'
    class      = bridge
    subclass   = HOST-PCI
pcib1@pci0:0:1:0:       class=0x060400 card=0x062415d9 chip=0x01018086 rev=0x09 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = 'Xeon E3-1200/2nd Generation Core Processor Family PCI Express Root Port'
    class      = bridge
    subclass   = PCI-PCI
em0@pci0:0:25:0:        class=0x020000 card=0x150215d9 chip=0x15028086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82579LM Gigabit Network Connection'
    class      = network
    subclass   = ethernet
ehci0@pci0:0:26:0:      class=0x0c0320 card=0x062415d9 chip=0x1c2d8086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family USB Enhanced Host Controller'
    class      = serial bus
    subclass   = USB
pcib2@pci0:0:28:0:      class=0x060400 card=0x062415d9 chip=0x1c108086 rev=0xb5 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family PCI Express Root Port 1'
    class      = bridge
    subclass   = PCI-PCI
pcib3@pci0:0:28:4:      class=0x060400 card=0x062415d9 chip=0x1c188086 rev=0xb5 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family PCI Express Root Port 5'
    class      = bridge
    subclass   = PCI-PCI
ehci1@pci0:0:29:0:      class=0x0c0320 card=0x062415d9 chip=0x1c268086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family USB Enhanced Host Controller'
    class      = serial bus
    subclass   = USB
pcib4@pci0:0:30:0:      class=0x060401 card=0x062415d9 chip=0x244e8086 rev=0xa5 hdr=0x01
    vendor     = 'Intel Corporation'
    device     = '82801 PCI Bridge'
    class      = bridge
    subclass   = PCI-PCI
isab0@pci0:0:31:0:      class=0x060100 card=0x062415d9 chip=0x1c548086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'C204 Chipset Family LPC Controller'
    class      = bridge
    subclass   = PCI-ISA
ahci0@pci0:0:31:2:      class=0x010601 card=0x062415d9 chip=0x1c028086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller'
    class      = mass storage
    subclass   = SATA
none0@pci0:0:31:3:      class=0x0c0500 card=0x062415d9 chip=0x1c228086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family SMBus Controller'
    class      = serial bus
    subclass   = SMBus
mpt0@pci0:1:0:0:        class=0x010000 card=0x31401000 chip=0x00581000 rev=0x08 hdr=0x00
    vendor     = 'LSI Logic / Symbios Logic'
    device     = 'SAS1068E PCI-Express Fusion-MPT SAS'
    class      = mass storage
    subclass   = SCSI
em1@pci0:2:0:0: class=0x020000 card=0x10838086 chip=0x10b98086 rev=0x06 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82572EI Gigabit Ethernet Controller (Copper)'
    class      = network
    subclass   = ethernet
em2@pci0:3:0:0: class=0x020000 card=0x000015d9 chip=0x10d38086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82574L Gigabit Network Connection'
    class      = network
    subclass   = ethernet
vgapci0@pci0:4:3:0:     class=0x030000 card=0x062415d9 chip=0x0532102b rev=0x0a hdr=0x00
    vendor     = 'Matrox Graphics, Inc.'
    device     = 'MGA G200eW WPCM450'
    class      = display
    subclass   = VGA

>Description:
	I upgraded to 9-stable and around the same time had drive failures.  
Now when doing heavy I/O to drives attached to the mpt controller I get errors
such as 
(da3:mpt0:0:14:0): WRITE(10). CDB: 2a 0 42 0 45 f0 0 0 8 0
(da3:mpt0:0:14:0): CAM status: SCSI Status Error
(da3:mpt0:0:14:0): SCSI status: Check Condition
(da3:mpt0:0:14:0): SCSI sense: MEDIUM ERROR asc:14,1 (Record not found)
and
(da7:mpt0:0:13:0): SCSI status error
(da7:mpt0:0:13:0): WRITE(10). CDB: 2a 0 2a 0 a6 10 0 0 28 0
(da7:mpt0:0:13:0): CAM status: SCSI Status Error
(da7:mpt0:0:13:0): SCSI status: Check Condition
(da7:mpt0:0:13:0): SCSI sense: ABORTED COMMAND asc:0,0 (No additional sense information)
(da7:mpt0:0:13:0): Retrying command (per sense data)
and
(da7:mpt0:0:13:0): SCSI status error
(da7:mpt0:0:13:0): READ(10). CDB: 28 0 29 e d7 0 0 0 38 0
(da7:mpt0:0:13:0): CAM status: SCSI Status Error
(da7:mpt0:0:13:0): SCSI status: Check Condition
(da7:mpt0:0:13:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da7:mpt0:0:13:0): Retrying command (per sense data)
(da7:mpt0:0:13:0): CAM status 0x18
(da7:mpt0:0:13:0): Retrying command
(da7:mpt0:0:13:0): CAM status 0x18
and
(da7:mpt0:0:13:0): SCSI status error
(da7:mpt0:0:13:0): WRITE(10). CDB: 2a 0 5c 0 2c b0 0 0 8 0
(da7:mpt0:0:13:0): CAM status: SCSI Status Error
(da7:mpt0:0:13:0): SCSI status: Check Condition
(da7:mpt0:0:13:0): SCSI sense: ABORTED COMMAND asc:0,0 (No additional sense information)
(da7:mpt0:0:13:0): Error 5, Retries exhausted
mpt0: request 0xffffff8001a68060:30428 timed out for ccb 0xfffffe00072d6800 (req->ccb 0xfffffe00072d6800)
mpt0: request 0xffffff8001a73cd0:30429 timed out for ccb 0xfffffe0007453800 (req->ccb 0xfffffe0007453800)
mpt0: attempting to abort req 0xffffff8001a68060:30428 function 0
mpt0: request 0xffffff8001a682a0:30430 timed out for ccb 0xfffffe001202c800 (req->ccb 0xfffffe001202c800)
mpt0: request 0xffffff8001a745d0:30431 timed out for ccb 0xfffffe0007408000 (req->ccb 0xfffffe0007408000)
mpt0: request 0xffffff8001a72da0:30432 timed out for ccb 0xfffffe0006842800 (req->ccb 0xfffffe0006842800)
mpt0: request 0xffffff8001a73a00:30433 timed out for ccb 0xfffffe0007fcd000 (req->ccb 0xfffffe0007fcd000)
mpt0: request 0xffffff8001a66710:30434 timed out for ccb 0xfffffe000740a000 (req->ccb 0xfffffe000740a000)
mpt0: mpt_wait_req(1) timed out
mpt0: mpt_recover_commands: abort timed-out. Resetting controller
mpt0: mpt_cam_event: 0x80
mpt0: Unhandled Event Notify Frame. Event 0xffffff80 (ACK not required).
mpt0: completing timedout/aborted req 0xffffff8001a68060:30428
mpt0: completing timedout/aborted req 0xffffff8001a73cd0:30429
mpt0: completing timedout/aborted req 0xffffff8001a682a0:30430
mpt0: completing timedout/aborted req 0xffffff8001a745d0:30431
mpt0: completing timedout/aborted req 0xffffff8001a72da0:30432
mpt0: completing timedout/aborted req 0xffffff8001a73a00:30433
mpt0: completing timedout/aborted req 0xffffff8001a66710:30434
(da7:mpt0:0:13:0): Bus Reset issued
(da7:mpt0:0:13:0): Retrying command
and finally
(da7:mpt0:0:13:0): SCSI status error
(da7:mpt0:0:13:0): READ(10). CDB: 28 0 29 e d6 f8 0 0 8 0
(da7:mpt0:0:13:0): CAM status: SCSI Status Error
(da7:mpt0:0:13:0): SCSI status: Check Condition
(da7:mpt0:0:13:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
(da7:mpt0:0:13:0): Info: 0x290ed6f8
(da7:mpt0:0:13:0): Error 5, Unretryable error
mpt0: request 0xffffff8001a6bff0:30904 timed out for ccb 0xfffffe0007453800 (req->ccb 0xfffffe0007453800)
mpt0: attempting to abort req 0xffffff8001a6bff0:30904 function 0
mpt0: mpt_send_handshake_cmd: db ignored
mpt0: soft reset failed: device not running
mpt0: WARNING - Failed hard reset! Trying to initialize anyway.
mpt0: mpt_cam_event: 0xff
mpt0: Unhandled Event Notify Frame. Event 0xffffffff (ACK not required).
mpt0: completing timedout/aborted req 0xffffff8001a6bff0:30904
(da0:mpt0:0:1:0): Bus Reset issued
(da0:mpt0:0:1:0): Retrying command

Eventually the system gets into a state where no disk I/O happens at all and 
multiple drives are lost and I have to reset it.

>How-To-Repeat:
	Place a heavy I/O load on an MTP controller with SATA drives.
>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201203121703.q2CH3NeF002640>