From owner-freebsd-scsi@FreeBSD.ORG Tue Feb 10 07:14:15 2009 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 615F2106566B for ; Tue, 10 Feb 2009 07:14:15 +0000 (UTC) (envelope-from spork@bway.net) Received: from xena.bway.net (xena.bway.net [216.220.96.26]) by mx1.freebsd.org (Postfix) with ESMTP id 0FF348FC08 for ; Tue, 10 Feb 2009 07:14:14 +0000 (UTC) (envelope-from spork@bway.net) Received: (qmail 80459 invoked by uid 0); 10 Feb 2009 07:14:14 -0000 Received: from unknown (HELO toasty.nat.fasttrackmonkey.com) (spork@96.57.144.66) by smtp.bway.net with (DHE-RSA-AES256-SHA encrypted) SMTP; 10 Feb 2009 07:14:14 -0000 Date: Tue, 10 Feb 2009 02:14:13 -0500 (EST) From: Charles Sprickman X-X-Sender: spork@toasty.nat.fasttrackmonkey.com To: Scott Long In-Reply-To: Message-ID: References: <49911C68.6030203@samsco.org> User-Agent: Alpine 2.00 (OSX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-scsi@freebsd.org Subject: Re: 7.1 Panic on degraded disk w/mpt X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Feb 2009 07:14:15 -0000 On Tue, 10 Feb 2009, Charles Sprickman wrote: > On Mon, 9 Feb 2009, Scott Long wrote: > >> Charles Sprickman wrote: >>> (posted on -stable already, no takers - added info: full dmesg, crash info >>> from panic when array finished rebuilding, some comments on dmesg) >>> >>> Howdy, >>> >>> I dug around and can't find a PR on this, and the only other report I saw >>> was in this mailing list post that has no replies: >>> >>> http://www.nabble.com/7.1-BETA2-panic-on-mpt-degrade-td20183173.html >>> >>> The hardware is a Dell PowerEdge 860 with the Dell/LSI SAS5 controller: >>> >>> mpt0: port 0xec00-0xecff mem >>> 0xfe9fc000-0xfe9fffff,0xfe9e0000-0xfe9effff irq 16 at device 8.0 on pci2 >>> mpt0: MPI Version=1.5.13.0 >>> >>> The panic is repeatable by forcing the array into a degraded state. When >>> the array finishes rebuilding, the box also panics. >>> >>> Here's my best shot at getting info out of kgdb (panic on array going to >>> degraded state): >> >> I wonder if the MPT card is temporarily detaching and then reattaching >> the logical drive when the rebuild completes. > > IIRC, just before the panic there is a bunch of CAM debug splattered across > the monitor. I can run down to the garage and snap a few pics of the monitor > after detaching a drive. OK, some more info here. I wanted to be safe, so I brought the machine down to single user and unmounted everything but /. It did not panic on the drive being removed. So perhaps a quiet filesystem = no panic. Here's what gets spit out on the console: mpt0: mpt_cam_event: 0x16 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required). mpt0: mpt_cam_event: 0x12 mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required). mpt0: mpt_cam_event: 0x16 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required). (mpt0:vol0:1): Physical Disk Status Changed mpt0: mpt_cam_event: 0x15 mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required). mpt0: mpt_cam_event: 0x21 mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required). (mpt0:vol0:1): Physical Disk Status Changed mpt0:vol0(mpt0:0:0): Volume Status Changed mpt0: mpt_cam_event: 0x15 mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required). mpt0: mpt_cam_event: 0x21 mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required). mpt0: mpt_cam_event: 0x15 mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required). mpt0: mpt_cam_event: 0x21 mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required). mpt0:vol0(mpt0:0:0): RAID-1 - Degraded mpt0:vol0(mpt0:0:0): Status ( Enabled ) (mpt0:vol0:1): No longer configured (probe0:mpt0:1:0:0): error 22 (probe0:mpt0:1:0:0): Unretryable Error (probe2:mpt0:1:2:0): error 22 (probe2:mpt0:1:2:0): Unretryable Error (probe3:mpt0:1:3:0): error 22 (repeats with probe # increasing...) (probe1:mpt0:1:1:0): CAM Status 0x19 (probe1:mpt0:1:1:0): Retrying Command (probe0:mpt0:1:0:0): error 22 (probe0:mpt0:1:0:0): Unretryable Error (pass1:mpt0:1:0:0): lost device (pass1:mpt0:1:0:0): removing device entry So it does appear that at the very least the mpt driver is removing the pass device for that drive, right? And on reattach: mpt0: mpt_cam_event: 0x16 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required). mpt0: mpt_cam_event: 0x12 mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required). mpt0: mpt_cam_event: 0x16 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required). mpt0: Volume(0:1:0): Physical Disk Status Changed mpt0: mpt_cam_event: 0x15 mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required). mpt0: mpt_cam_event: 0x21 mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required). (mpt0:vol0:1): Physical (mpt0:0:1:0), Pass-thru (mpt0:1:0:0) (mpt0:vol0:1): Online (mpt0:vol0:1): Status ( Out-Of-Sync ) (probe2:mpt0:1:2:0): error 22 (probe2:mpt0:1:2:0): Unretryable Error (probe3:mpt0:1:3:0): error 22 (rinse, repeat) pass1 at mpt0 bus 1 target 0 lun 0 pass1: Fixed unknown SCSI-5 device pass1: Serial Number 5QD56ZXC pass1: 300.000MB/s transfers pass1: Command Queueing Enabled mpt0: mpt_cam_event: 0x15 mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required). mpt0: mpt_cam_event: 0x21 mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required). mpt0: mpt_cam_event: 0x21 mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required). mpt0: mpt_cam_event: 0x21 mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required). mpt0:vol0(mpt0:0:0): Volume Status Changed mpt0:vol0(mpt0:0:0): RAID-1 - Degraded mpt0:vol0(mpt0:0:0): Status ( Enabled Re-Syncing ) mpt0:vol0(mpt0:0:0): High Priority Re-Sync mpt0:vol0(mpt0:0:0): 1464842240 of 1464842240 blocks remaining I'm betting it will panic again in a few hours when the rebuild finishes. I'll try the detach again tomorrow with all the filesystems mounted and I'll make sure there's some pending writes when I detach. If I see anything interesting before the panic message on screen, I'll grab it. Thanks, Charles