Date: Mon, 03 May 2010 16:21:01 -0400 From: Kurt Lidl <kurt.lidl@cello.com> To: freebsd-stable@freebsd.org Subject: raidz2 recovery problem on 8.0p2 Message-ID: <4BDF302D.9020500@cello.com>
next in thread | raw e-mail | index | archive | help
I have a 12GB memory machine, with a mpt controller in it, running a ZFS raidz2 for (test) data storage. The system also has a ZFS mirror in place for the OS, home directories, etc. I manually failed one of the disks in the JBOD shelf and watched as the mpt controller started logging errors. Ultimately, I tried to reboot the machine, but it panic'd instead of rebooting cleanly. It failed to crashdump too (Got about 200MB into the dump and stopped.) Upon reboot, I saw that zfs thought there were two da6 disk devices. Which was strange, since at this point, the machine should have had da0 through da6. I issued a 'zpool clear media da6' command, but that didn't resolve anything. Then I plugged the drive back into the JBOD and rebooted. Now I see the following: user@host: zpool status media pool: media state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: none requested config: NAME STATE READ WRITE CKSUM media DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 da6 FAULTED 0 98 0 corrupted data errors: No known data errors Note that there are *two* da6 devices listed, at least from zpool's point of view. A dmesg reports this: da0 at mpt0 bus 0 target 8 lun 0 da0: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device da0: 300.000MB/s transfers da0: Command Queueing enabled da0: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) da1 at mpt0 bus 0 target 9 lun 0 da1: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device da1: 300.000MB/s transfers da1: Command Queueing enabled da1: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) da2 at mpt0 bus 0 target 10 lun 0 da2: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device da2: 300.000MB/s transfers da2: Command Queueing enabled da2: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) da3 at mpt0 bus 0 target 11 lun 0 da3: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device da3: 300.000MB/s transfers da3: Command Queueing enabled da3: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) da4 at mpt0 bus 0 target 12 lun 0 da4: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device da4: 300.000MB/s transfers da4: Command Queueing enabled da4: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) da5 at mpt0 bus 0 target 13 lun 0 da5: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device da5: 300.000MB/s transfers da5: Command Queueing enabled da5: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) da6 at mpt0 bus 0 target 14 lun 0 da6: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device da6: 300.000MB/s transfers da6: Command Queueing enabled da6: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) da7 at mpt0 bus 0 target 15 lun 0 da7: <ATA ST31500341AS CC1H> Fixed Direct Access SCSI-5 device da7: 300.000MB/s transfers da7: Command Queueing enabled da7: 1430799MB (2930277168 512 byte sectors: 255H 63S/T 182401C) Any suggestions about how to get this raid back into a non-degraded state? For whatever it's worth, 'uname -a' reports: FreeBSD host.fairview-park.com 8.0-RELEASE-p2 FreeBSD 8.0-RELEASE-p2 #0: Tue Jan 5 21:11:58 UTC 2010 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 Thanks for any help. -Kurt
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4BDF302D.9020500>