From owner-freebsd-stable@FreeBSD.ORG  Wed May  5 03:02:49 2010
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id A767F1065673
	for <freebsd-stable@freebsd.org>; Wed,  5 May 2010 03:02:49 +0000 (UTC)
	(envelope-from morganw@chemikals.org)
Received: from warped.bluecherry.net (warped.bluecherry.net [66.138.159.247])
	by mx1.freebsd.org (Postfix) with ESMTP id 3C3018FC18
	for <freebsd-stable@freebsd.org>; Wed,  5 May 2010 03:02:48 +0000 (UTC)
Received: from volatile.chemikals.org (adsl-80-26-171.shv.bellsouth.net
	[98.80.26.171])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by warped.bluecherry.net (Postfix) with ESMTPSA id 2003E805E332;
	Tue,  4 May 2010 22:02:46 -0500 (CDT)
Received: from localhost (morganw@localhost [127.0.0.1])
	by volatile.chemikals.org (8.14.4/8.14.4) with ESMTP id o4532hDb071712; 
	Tue, 4 May 2010 22:02:43 -0500 (CDT)
	(envelope-from morganw@chemikals.org)
Date: Tue, 4 May 2010 22:02:43 -0500 (CDT)
From: Wes Morgan <morganw@chemikals.org>
X-X-Sender: morganw@volatile
To: Kurt Lidl <kurt.lidl@cello.com>
In-Reply-To: <4BDF302D.9020500@cello.com>
Message-ID: <alpine.BSF.2.00.1005042155330.87260@ibyngvyr>
References: <4BDF302D.9020500@cello.com>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Virus-Scanned: clamav-milter 0.95.3 at warped
X-Virus-Status: Clean
Cc: freebsd-stable@freebsd.org
Subject: Re: raidz2 recovery problem on 8.0p2
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 05 May 2010 03:02:49 -0000

On Mon, 3 May 2010, Kurt Lidl wrote:

> I have a 12GB memory machine, with a mpt controller in it, running a ZFS
> raidz2
> for (test) data storage.  The system also has a ZFS mirror in place for the
> OS,
> home directories, etc.
>
> I manually failed one of the disks in the JBOD shelf and watched as the mpt
> controller started logging errors.  Ultimately, I tried to reboot the machine,
> but it panic'd instead of rebooting cleanly.  It failed to crashdump too (Got
> about 200MB into
> the dump and stopped.)
>
> Upon reboot, I saw that zfs thought there were two da6 disk devices.
> Which was strange, since at this point, the machine should have had
> da0 through da6.  I issued a 'zpool clear media da6' command, but
> that didn't resolve anything.
>
> Then I plugged the drive back into the JBOD and rebooted.
> Now I see the following:
>
> user@host: zpool status media
>   pool: media
>  state: DEGRADED
> status: One or more devices could not be used because the label is missing or
>         invalid.  Sufficient replicas exist for the pool to continue
>         functioning in a degraded state.
> action: Replace the device using 'zpool replace'.
>    see: http://www.sun.com/msg/ZFS-8000-4J
>  scrub: none requested
> config:
>
>         NAME        STATE     READ WRITE CKSUM
>         media       DEGRADED     0     0     0
>           raidz2    DEGRADED     0     0     0
>             da0     ONLINE       0     0     0
>             da1     ONLINE       0     0     0
>             da2     ONLINE       0     0     0
>             da3     ONLINE       0     0     0
>             da4     ONLINE       0     0     0
>             da5     ONLINE       0     0     0
>             da6     ONLINE       0     0     0
>             da6     FAULTED      0    98     0  corrupted data
>
> errors: No known data errors
>
> Note that there are *two* da6 devices listed, at least from zpool's point of
> view.
> A dmesg reports this:
>
> da0 at mpt0 bus 0 target 8 lun 0
> da0: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
> da0: 300.000MB/s transfers
> da0: Command Queueing enabled
> da0: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
> da1 at mpt0 bus 0 target 9 lun 0
> da1: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
> da1: 300.000MB/s transfers
> da1: Command Queueing enabled
> da1: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
> da2 at mpt0 bus 0 target 10 lun 0
> da2: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
> da2: 300.000MB/s transfers
> da2: Command Queueing enabled
> da2: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
> da3 at mpt0 bus 0 target 11 lun 0
> da3: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
> da3: 300.000MB/s transfers
> da3: Command Queueing enabled
> da3: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
> da4 at mpt0 bus 0 target 12 lun 0
> da4: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
> da4: 300.000MB/s transfers
> da4: Command Queueing enabled
> da4: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
> da5 at mpt0 bus 0 target 13 lun 0
> da5: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
> da5: 300.000MB/s transfers
> da5: Command Queueing enabled
> da5: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
> da6 at mpt0 bus 0 target 14 lun 0
> da6: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
> da6: 300.000MB/s transfers
> da6: Command Queueing enabled
> da6: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
> da7 at mpt0 bus 0 target 15 lun 0
> da7: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
> da7: 300.000MB/s transfers
> da7: Command Queueing enabled
> da7: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
>
> Any suggestions about how to get this raid back into a non-degraded state?

Have you tried exporting and importing the pool?

If that doesn't work, what is the output of zdb?