From owner-freebsd-stable@FreeBSD.ORG Mon May 3 20:34:23 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4D61D1065678 for ; Mon, 3 May 2010 20:34:23 +0000 (UTC) (envelope-from kurt.lidl@cello.com) Received: from Mail.Fairview-Park.Com (Mail.Fairview-Park.Com [98.141.206.6]) by mx1.freebsd.org (Postfix) with ESMTP id 07DBB8FC1A for ; Mon, 3 May 2010 20:34:22 +0000 (UTC) Received: from [192.168.8.101] (Kurt.Fairview-Park.Com [192.168.8.101]) by Mail.Fairview-Park.Com (8.14.3/8.14.3) with ESMTP id o43KL1hI045861 for ; Mon, 3 May 2010 16:21:10 -0400 (EDT) (envelope-from kurt.lidl@cello.com) X-FVP-rcvd: Kurt.Fairview-Park.Com [192.168.8.101] Mon, 3 May 2010 16:21:10 -0400 (EDT) Message-ID: <4BDF302D.9020500@cello.com> Date: Mon, 03 May 2010 16:21:01 -0400 From: Kurt Lidl Organization: Cello Software User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9) Gecko/20100317 Lightning/1.0b1 Thunderbird/3.0.4 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: clamav-milter 0.96 at Mail.Fairview-Park.Com X-Virus-Status: Clean Subject: raidz2 recovery problem on 8.0p2 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 May 2010 20:34:23 -0000 I have a 12GB memory machine, with a mpt controller in it, running a ZFS raidz2 for (test) data storage. The system also has a ZFS mirror in place for the OS, home directories, etc. I manually failed one of the disks in the JBOD shelf and watched as the mpt controller started logging errors. Ultimately, I tried to reboot the machine, but it panic'd instead of rebooting cleanly. It failed to crashdump too (Got about 200MB into the dump and stopped.) Upon reboot, I saw that zfs thought there were two da6 disk devices. Which was strange, since at this point, the machine should have had da0 through da6. I issued a 'zpool clear media da6' command, but that didn't resolve anything. Then I plugged the drive back into the JBOD and rebooted. Now I see the following: user@host: zpool status media pool: media state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: none requested config: NAME STATE READ WRITE CKSUM media DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 da6 FAULTED 0 98 0 corrupted data errors: No known data errors Note that there are *two* da6 devices listed, at least from zpool's point of view. A dmesg reports this: da0 at mpt0 bus 0 target 8 lun 0 da0: Fixed Direct Access SCSI-5 device da0: 300.000MB/s transfers da0: Command Queueing enabled da0: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) da1 at mpt0 bus 0 target 9 lun 0 da1: Fixed Direct Access SCSI-5 device da1: 300.000MB/s transfers da1: Command Queueing enabled da1: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) da2 at mpt0 bus 0 target 10 lun 0 da2: Fixed Direct Access SCSI-5 device da2: 300.000MB/s transfers da2: Command Queueing enabled da2: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) da3 at mpt0 bus 0 target 11 lun 0 da3: Fixed Direct Access SCSI-5 device da3: 300.000MB/s transfers da3: Command Queueing enabled da3: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) da4 at mpt0 bus 0 target 12 lun 0 da4: Fixed Direct Access SCSI-5 device da4: 300.000MB/s transfers da4: Command Queueing enabled da4: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) da5 at mpt0 bus 0 target 13 lun 0 da5: Fixed Direct Access SCSI-5 device da5: 300.000MB/s transfers da5: Command Queueing enabled da5: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) da6 at mpt0 bus 0 target 14 lun 0 da6: Fixed Direct Access SCSI-5 device da6: 300.000MB/s transfers da6: Command Queueing enabled da6: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) da7 at mpt0 bus 0 target 15 lun 0 da7: Fixed Direct Access SCSI-5 device da7: 300.000MB/s transfers da7: Command Queueing enabled da7: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) Any suggestions about how to get this raid back into a non-degraded state? For whatever it's worth, 'uname -a' reports: FreeBSD host.fairview-park.com 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Tue Jan 5 21:11:58 UTC 2010 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 Thanks for any help. -Kurt