From owner-freebsd-stable@FreeBSD.ORG  Mon May  3 20:34:23 2010
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 4D61D1065678
	for <freebsd-stable@freebsd.org>; Mon,  3 May 2010 20:34:23 +0000 (UTC)
	(envelope-from kurt.lidl@cello.com)
Received: from Mail.Fairview-Park.Com (Mail.Fairview-Park.Com [98.141.206.6])
	by mx1.freebsd.org (Postfix) with ESMTP id 07DBB8FC1A
	for <freebsd-stable@freebsd.org>; Mon,  3 May 2010 20:34:22 +0000 (UTC)
Received: from [192.168.8.101] (Kurt.Fairview-Park.Com [192.168.8.101])
	by Mail.Fairview-Park.Com (8.14.3/8.14.3) with ESMTP id o43KL1hI045861
	for <freebsd-stable@freebsd.org>; Mon, 3 May 2010 16:21:10 -0400 (EDT)
	(envelope-from kurt.lidl@cello.com)
X-FVP-rcvd: Kurt.Fairview-Park.Com [192.168.8.101] Mon,
	3 May 2010 16:21:10 -0400 (EDT)
Message-ID: <4BDF302D.9020500@cello.com>
Date: Mon, 03 May 2010 16:21:01 -0400
From: Kurt Lidl <kurt.lidl@cello.com>
Organization: Cello Software
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
	rv:1.9.1.9) Gecko/20100317 Lightning/1.0b1 Thunderbird/3.0.4
MIME-Version: 1.0
To: freebsd-stable@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: clamav-milter 0.96 at Mail.Fairview-Park.Com
X-Virus-Status: Clean
Subject: raidz2 recovery problem on 8.0p2
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 03 May 2010 20:34:23 -0000

I have a 12GB memory machine, with a mpt controller in it, running a ZFS 
raidz2
for (test) data storage.  The system also has a ZFS mirror in place for 
the OS,
home directories, etc.

I manually failed one of the disks in the JBOD shelf and watched as the mpt
controller started logging errors.  Ultimately, I tried to reboot the 
machine,
but it panic'd instead of rebooting cleanly.  It failed to crashdump too 
(Got about 200MB into
the dump and stopped.)

Upon reboot, I saw that zfs thought there were two da6 disk devices.
Which was strange, since at this point, the machine should have had
da0 through da6.  I issued a 'zpool clear media da6' command, but
that didn't resolve anything.

Then I plugged the drive back into the JBOD and rebooted.
Now I see the following:

user@host: zpool status media
   pool: media
  state: DEGRADED
status: One or more devices could not be used because the label is 
missing or
         invalid.  Sufficient replicas exist for the pool to continue
         functioning in a degraded state.
action: Replace the device using 'zpool replace'.
    see: http://www.sun.com/msg/ZFS-8000-4J
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         media       DEGRADED     0     0     0
           raidz2    DEGRADED     0     0     0
             da0     ONLINE       0     0     0
             da1     ONLINE       0     0     0
             da2     ONLINE       0     0     0
             da3     ONLINE       0     0     0
             da4     ONLINE       0     0     0
             da5     ONLINE       0     0     0
             da6     ONLINE       0     0     0
             da6     FAULTED      0    98     0  corrupted data

errors: No known data errors

Note that there are *two* da6 devices listed, at least from zpool's 
point of view.
A dmesg reports this:

da0 at mpt0 bus 0 target 8 lun 0
da0: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
da0: 300.000MB/s transfers
da0: Command Queueing enabled
da0: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
da1 at mpt0 bus 0 target 9 lun 0
da1: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
da1: 300.000MB/s transfers
da1: Command Queueing enabled
da1: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
da2 at mpt0 bus 0 target 10 lun 0
da2: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
da2: 300.000MB/s transfers
da2: Command Queueing enabled
da2: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
da3 at mpt0 bus 0 target 11 lun 0
da3: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
da3: 300.000MB/s transfers
da3: Command Queueing enabled
da3: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
da4 at mpt0 bus 0 target 12 lun 0
da4: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
da4: 300.000MB/s transfers
da4: Command Queueing enabled
da4: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
da5 at mpt0 bus 0 target 13 lun 0
da5: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
da5: 300.000MB/s transfers
da5: Command Queueing enabled
da5: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
da6 at mpt0 bus 0 target 14 lun 0
da6: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
da6: 300.000MB/s transfers
da6: Command Queueing enabled
da6: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)
da7 at mpt0 bus 0 target 15 lun 0
da7: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device
da7: 300.000MB/s transfers
da7: Command Queueing enabled
da7: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C)

Any suggestions about how to get this raid back into a non-degraded state?

For whatever it's worth, 'uname -a' reports:
FreeBSD host.fairview-park.com 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: 
Tue Jan  5 21:11:58 UTC 2010     
root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

Thanks for any help.

-Kurt