From owner-freebsd-stable@FreeBSD.ORG Wed May 5 03:02:49 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A767F1065673 for ; Wed, 5 May 2010 03:02:49 +0000 (UTC) (envelope-from morganw@chemikals.org) Received: from warped.bluecherry.net (warped.bluecherry.net [66.138.159.247]) by mx1.freebsd.org (Postfix) with ESMTP id 3C3018FC18 for ; Wed, 5 May 2010 03:02:48 +0000 (UTC) Received: from volatile.chemikals.org (adsl-80-26-171.shv.bellsouth.net [98.80.26.171]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by warped.bluecherry.net (Postfix) with ESMTPSA id 2003E805E332; Tue, 4 May 2010 22:02:46 -0500 (CDT) Received: from localhost (morganw@localhost [127.0.0.1]) by volatile.chemikals.org (8.14.4/8.14.4) with ESMTP id o4532hDb071712; Tue, 4 May 2010 22:02:43 -0500 (CDT) (envelope-from morganw@chemikals.org) Date: Tue, 4 May 2010 22:02:43 -0500 (CDT) From: Wes Morgan X-X-Sender: morganw@volatile To: Kurt Lidl In-Reply-To: <4BDF302D.9020500@cello.com> Message-ID: References: <4BDF302D.9020500@cello.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: clamav-milter 0.95.3 at warped X-Virus-Status: Clean Cc: freebsd-stable@freebsd.org Subject: Re: raidz2 recovery problem on 8.0p2 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 May 2010 03:02:49 -0000 On Mon, 3 May 2010, Kurt Lidl wrote: > I have a 12GB memory machine, with a mpt controller in it, running a ZFS > raidz2 > for (test) data storage. The system also has a ZFS mirror in place for the > OS, > home directories, etc. > > I manually failed one of the disks in the JBOD shelf and watched as the mpt > controller started logging errors. Ultimately, I tried to reboot the machine, > but it panic'd instead of rebooting cleanly. It failed to crashdump too (Got > about 200MB into > the dump and stopped.) > > Upon reboot, I saw that zfs thought there were two da6 disk devices. > Which was strange, since at this point, the machine should have had > da0 through da6. I issued a 'zpool clear media da6' command, but > that didn't resolve anything. > > Then I plugged the drive back into the JBOD and rebooted. > Now I see the following: > > user@host: zpool status media > pool: media > state: DEGRADED > status: One or more devices could not be used because the label is missing or > invalid. Sufficient replicas exist for the pool to continue > functioning in a degraded state. > action: Replace the device using 'zpool replace'. > see: http://www.sun.com/msg/ZFS-8000-4J > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > media DEGRADED 0 0 0 > raidz2 DEGRADED 0 0 0 > da0 ONLINE 0 0 0 > da1 ONLINE 0 0 0 > da2 ONLINE 0 0 0 > da3 ONLINE 0 0 0 > da4 ONLINE 0 0 0 > da5 ONLINE 0 0 0 > da6 ONLINE 0 0 0 > da6 FAULTED 0 98 0 corrupted data > > errors: No known data errors > > Note that there are *two* da6 devices listed, at least from zpool's point of > view. > A dmesg reports this: > > da0 at mpt0 bus 0 target 8 lun 0 > da0: Fixed Direct Access SCSI-5 device > da0: 300.000MB/s transfers > da0: Command Queueing enabled > da0: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) > da1 at mpt0 bus 0 target 9 lun 0 > da1: Fixed Direct Access SCSI-5 device > da1: 300.000MB/s transfers > da1: Command Queueing enabled > da1: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) > da2 at mpt0 bus 0 target 10 lun 0 > da2: Fixed Direct Access SCSI-5 device > da2: 300.000MB/s transfers > da2: Command Queueing enabled > da2: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) > da3 at mpt0 bus 0 target 11 lun 0 > da3: Fixed Direct Access SCSI-5 device > da3: 300.000MB/s transfers > da3: Command Queueing enabled > da3: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) > da4 at mpt0 bus 0 target 12 lun 0 > da4: Fixed Direct Access SCSI-5 device > da4: 300.000MB/s transfers > da4: Command Queueing enabled > da4: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) > da5 at mpt0 bus 0 target 13 lun 0 > da5: Fixed Direct Access SCSI-5 device > da5: 300.000MB/s transfers > da5: Command Queueing enabled > da5: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) > da6 at mpt0 bus 0 target 14 lun 0 > da6: Fixed Direct Access SCSI-5 device > da6: 300.000MB/s transfers > da6: Command Queueing enabled > da6: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) > da7 at mpt0 bus 0 target 15 lun 0 > da7: Fixed Direct Access SCSI-5 device > da7: 300.000MB/s transfers > da7: Command Queueing enabled > da7: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) > > Any suggestions about how to get this raid back into a non-degraded state? Have you tried exporting and importing the pool? If that doesn't work, what is the output of zdb?