Date: Tue, 10 Feb 2009 02:14:13 -0500 (EST) From: Charles Sprickman <spork@bway.net> To: Scott Long <scottl@samsco.org> Cc: freebsd-scsi@freebsd.org Subject: Re: 7.1 Panic on degraded disk w/mpt Message-ID: <alpine.OSX.2.00.0902100206490.37588@toasty.nat.fasttrackmonkey.com> In-Reply-To: <alpine.OSX.2.00.0902100135290.37588@toasty.nat.fasttrackmonkey.com> References: <alpine.OSX.2.00.0902100104170.37588@toasty.nat.fasttrackmonkey.com> <49911C68.6030203@samsco.org> <alpine.OSX.2.00.0902100135290.37588@toasty.nat.fasttrackmonkey.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 10 Feb 2009, Charles Sprickman wrote: > On Mon, 9 Feb 2009, Scott Long wrote: > >> Charles Sprickman wrote: >>> (posted on -stable already, no takers - added info: full dmesg, crash info >>> from panic when array finished rebuilding, some comments on dmesg) >>> >>> Howdy, >>> >>> I dug around and can't find a PR on this, and the only other report I saw >>> was in this mailing list post that has no replies: >>> >>> http://www.nabble.com/7.1-BETA2-panic-on-mpt-degrade-td20183173.html >>> >>> The hardware is a Dell PowerEdge 860 with the Dell/LSI SAS5 controller: >>> >>> mpt0: <LSILogic SAS/SATA Adapter> port 0xec00-0xecff mem >>> 0xfe9fc000-0xfe9fffff,0xfe9e0000-0xfe9effff irq 16 at device 8.0 on pci2 >>> mpt0: MPI Version=1.5.13.0 >>> >>> The panic is repeatable by forcing the array into a degraded state. When >>> the array finishes rebuilding, the box also panics. >>> >>> Here's my best shot at getting info out of kgdb (panic on array going to >>> degraded state): >> >> I wonder if the MPT card is temporarily detaching and then reattaching >> the logical drive when the rebuild completes. > > IIRC, just before the panic there is a bunch of CAM debug splattered across > the monitor. I can run down to the garage and snap a few pics of the monitor > after detaching a drive. OK, some more info here. I wanted to be safe, so I brought the machine down to single user and unmounted everything but /. It did not panic on the drive being removed. So perhaps a quiet filesystem = no panic. Here's what gets spit out on the console: mpt0: mpt_cam_event: 0x16 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required). mpt0: mpt_cam_event: 0x12 mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required). mpt0: mpt_cam_event: 0x16 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required). (mpt0:vol0:1): Physical Disk Status Changed mpt0: mpt_cam_event: 0x15 mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required). mpt0: mpt_cam_event: 0x21 mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required). (mpt0:vol0:1): Physical Disk Status Changed mpt0:vol0(mpt0:0:0): Volume Status Changed mpt0: mpt_cam_event: 0x15 mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required). mpt0: mpt_cam_event: 0x21 mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required). mpt0: mpt_cam_event: 0x15 mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required). mpt0: mpt_cam_event: 0x21 mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required). mpt0:vol0(mpt0:0:0): RAID-1 - Degraded mpt0:vol0(mpt0:0:0): Status ( Enabled ) (mpt0:vol0:1): No longer configured (probe0:mpt0:1:0:0): error 22 (probe0:mpt0:1:0:0): Unretryable Error (probe2:mpt0:1:2:0): error 22 (probe2:mpt0:1:2:0): Unretryable Error (probe3:mpt0:1:3:0): error 22 (repeats with probe # increasing...) (probe1:mpt0:1:1:0): CAM Status 0x19 (probe1:mpt0:1:1:0): Retrying Command (probe0:mpt0:1:0:0): error 22 (probe0:mpt0:1:0:0): Unretryable Error (pass1:mpt0:1:0:0): lost device (pass1:mpt0:1:0:0): removing device entry So it does appear that at the very least the mpt driver is removing the pass device for that drive, right? And on reattach: mpt0: mpt_cam_event: 0x16 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required). mpt0: mpt_cam_event: 0x12 mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required). mpt0: mpt_cam_event: 0x16 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required). mpt0: Volume(0:1:0): Physical Disk Status Changed mpt0: mpt_cam_event: 0x15 mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required). mpt0: mpt_cam_event: 0x21 mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required). (mpt0:vol0:1): Physical (mpt0:0:1:0), Pass-thru (mpt0:1:0:0) (mpt0:vol0:1): Online (mpt0:vol0:1): Status ( Out-Of-Sync ) (probe2:mpt0:1:2:0): error 22 (probe2:mpt0:1:2:0): Unretryable Error (probe3:mpt0:1:3:0): error 22 (rinse, repeat) pass1 at mpt0 bus 1 target 0 lun 0 pass1: <ATA ST3750640NS G> Fixed unknown SCSI-5 device pass1: Serial Number 5QD56ZXC pass1: 300.000MB/s transfers pass1: Command Queueing Enabled mpt0: mpt_cam_event: 0x15 mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required). mpt0: mpt_cam_event: 0x21 mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required). mpt0: mpt_cam_event: 0x21 mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required). mpt0: mpt_cam_event: 0x21 mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required). mpt0:vol0(mpt0:0:0): Volume Status Changed mpt0:vol0(mpt0:0:0): RAID-1 - Degraded mpt0:vol0(mpt0:0:0): Status ( Enabled Re-Syncing ) mpt0:vol0(mpt0:0:0): High Priority Re-Sync mpt0:vol0(mpt0:0:0): 1464842240 of 1464842240 blocks remaining I'm betting it will panic again in a few hours when the rebuild finishes. I'll try the detach again tomorrow with all the filesystems mounted and I'll make sure there's some pending writes when I detach. If I see anything interesting before the panic message on screen, I'll grab it. Thanks, Charles
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.OSX.2.00.0902100206490.37588>