Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Feb 2009 02:14:13 -0500 (EST)
From:      Charles Sprickman <spork@bway.net>
To:        Scott Long <scottl@samsco.org>
Cc:        freebsd-scsi@freebsd.org
Subject:   Re: 7.1 Panic on degraded disk w/mpt
Message-ID:  <alpine.OSX.2.00.0902100206490.37588@toasty.nat.fasttrackmonkey.com>
In-Reply-To: <alpine.OSX.2.00.0902100135290.37588@toasty.nat.fasttrackmonkey.com>
References:  <alpine.OSX.2.00.0902100104170.37588@toasty.nat.fasttrackmonkey.com> <49911C68.6030203@samsco.org> <alpine.OSX.2.00.0902100135290.37588@toasty.nat.fasttrackmonkey.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 10 Feb 2009, Charles Sprickman wrote:

> On Mon, 9 Feb 2009, Scott Long wrote:
>
>> Charles Sprickman wrote:
>>> (posted on -stable already, no takers - added info: full dmesg, crash info 
>>> from panic when array finished rebuilding, some comments on dmesg)
>>> 
>>> Howdy,
>>> 
>>> I dug around and can't find a PR on this, and the only other report I saw 
>>> was in this mailing list post that has no replies:
>>> 
>>> http://www.nabble.com/7.1-BETA2-panic-on-mpt-degrade-td20183173.html
>>> 
>>> The hardware is a Dell PowerEdge 860 with the Dell/LSI SAS5 controller:
>>> 
>>> mpt0: <LSILogic SAS/SATA Adapter> port 0xec00-0xecff mem 
>>> 0xfe9fc000-0xfe9fffff,0xfe9e0000-0xfe9effff irq 16 at device 8.0 on pci2
>>> mpt0: MPI Version=1.5.13.0
>>> 
>>> The panic is repeatable by forcing the array into a degraded state.  When 
>>> the array finishes rebuilding, the box also panics.
>>> 
>>> Here's my best shot at getting info out of kgdb (panic on array going to 
>>> degraded state):
>> 
>> I wonder if the MPT card is temporarily detaching and then reattaching
>> the logical drive when the rebuild completes.
>
> IIRC, just before the panic there is a bunch of CAM debug splattered across 
> the monitor.  I can run down to the garage and snap a few pics of the monitor 
> after detaching a drive.

OK, some more info here.  I wanted to be safe, so I brought the machine 
down to single user and unmounted everything but /.  It did not panic on 
the drive being removed.  So perhaps a quiet filesystem = no panic.

Here's what gets spit out on the console:

mpt0: mpt_cam_event: 0x16
mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required).
mpt0: mpt_cam_event: 0x12
mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required).
mpt0: mpt_cam_event: 0x16
mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required).
(mpt0:vol0:1): Physical Disk Status Changed
mpt0: mpt_cam_event: 0x15
mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
(mpt0:vol0:1): Physical Disk Status Changed
mpt0:vol0(mpt0:0:0): Volume Status Changed
mpt0: mpt_cam_event: 0x15
mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
mpt0: mpt_cam_event: 0x15
mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
mpt0:vol0(mpt0:0:0): RAID-1 - Degraded
mpt0:vol0(mpt0:0:0): Status ( Enabled )
(mpt0:vol0:1): No longer configured
(probe0:mpt0:1:0:0): error 22
(probe0:mpt0:1:0:0): Unretryable Error
(probe2:mpt0:1:2:0): error 22
(probe2:mpt0:1:2:0): Unretryable Error
(probe3:mpt0:1:3:0): error 22
(repeats with probe # increasing...)
(probe1:mpt0:1:1:0): CAM Status 0x19
(probe1:mpt0:1:1:0): Retrying Command
(probe0:mpt0:1:0:0): error 22
(probe0:mpt0:1:0:0): Unretryable Error
(pass1:mpt0:1:0:0): lost device
(pass1:mpt0:1:0:0): removing device entry

So it does appear that at the very least the mpt driver is removing the 
pass device for that drive, right?

And on reattach:

mpt0: mpt_cam_event: 0x16
mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required).
mpt0: mpt_cam_event: 0x12
mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required).
mpt0: mpt_cam_event: 0x16
mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required).
mpt0: Volume(0:1:0): Physical Disk Status Changed
mpt0: mpt_cam_event: 0x15
mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
(mpt0:vol0:1): Physical (mpt0:0:1:0), Pass-thru (mpt0:1:0:0)
(mpt0:vol0:1): Online
(mpt0:vol0:1): Status ( Out-Of-Sync )
(probe2:mpt0:1:2:0): error 22
(probe2:mpt0:1:2:0): Unretryable Error
(probe3:mpt0:1:3:0): error 22
(rinse, repeat)

pass1 at mpt0 bus 1 target 0 lun 0
pass1: <ATA ST3750640NS G> Fixed unknown SCSI-5 device
pass1: Serial Number             5QD56ZXC
pass1: 300.000MB/s transfers
pass1: Command Queueing Enabled
mpt0: mpt_cam_event: 0x15
mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
mpt0:vol0(mpt0:0:0): Volume Status Changed
mpt0:vol0(mpt0:0:0): RAID-1 - Degraded
mpt0:vol0(mpt0:0:0): Status ( Enabled Re-Syncing )
mpt0:vol0(mpt0:0:0): High Priority Re-Sync
mpt0:vol0(mpt0:0:0): 1464842240 of 1464842240 blocks remaining

I'm betting it will panic again in a few hours when the rebuild finishes.

I'll try the detach again tomorrow with all the filesystems mounted and 
I'll make sure there's some pending writes when I detach.  If I see 
anything interesting before the panic message on screen, I'll grab it.

Thanks,

Charles



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.OSX.2.00.0902100206490.37588>