Date: Tue, 03 Feb 2009 13:40:37 +0100 From: =?ISO-8859-1?Q?Javier_Mart=EDn_Rueda?= <jmrueda@diatel.upm.es> To: Wes Morgan <morganw@chemikals.org>, freebsd-fs@freebsd.org Subject: Re: Raidz2 pool with single disk failure is faulted Message-ID: <49883B45.3040606@diatel.upm.es> In-Reply-To: <4987ED81.6080008@diatel.upm.es> References: <49879C62.6070509@diatel.upm.es> <alpine.BSF.2.00.0902022155190.10729@ibyngvyr.purzvxnyf.bet> <4987ED81.6080008@diatel.upm.es>
next in thread | previous in thread | raw e-mail | index | archive | help
I solved the problem. This is how I did it, in case one day it prevents somebody from jumping in front of a train :-) First of all, I got some insight from various sites, mailing list archives, documents, etc. Among them, maybe these two were more helpful: http://mail.opensolaris.org/pipermail/zfs-discuss/2008-October/051643.html http://opensolaris.org/os/community/zfs/docs/ondiskformat0822.pdf I suspected that maybe my uberblock was somehow corrupted, and thought it would be worthwhile to rollback to an earlier uberblock. However, my pool was raidz2 and the examples I had seen about how to do this were with simple pools, so I tried a different approach, which in the end proved very successful: First, I added a couple of printf to vdev_uberblock_load_done(), which is in /sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c: --- vdev_label.c.orig 2009-02-03 13:14:35.000000000 +0100 +++ vdev_label.c 2009-02-03 13:14:52.000000000 +0100 @@ -659,10 +659,12 @@ if (zio->io_error == 0 && uberblock_verify(ub) == 0) { mutex_enter(&spa->spa_uberblock_lock); + printf("JMR: vdev_uberblock_load_done ub_txg=%qd ub_timestamp=%qd\n", ub->ub_txg, ub->ub_timestamp); if (vdev_uberblock_compare(ub, ubbest) > 0) *ubbest = *ub; mutex_exit(&spa->spa_uberblock_lock); } + printf("JMR: vdev_uberblock_load_done ubbest ub_txg=%qd ub_timestamp=%qd\n", ubbest->ub_txg, ubbest->ub_timestamp); zio_buf_free(zio->io_data, zio->io_size); } After compiling and loading the zfs.ko module, I executed "zpool import" and these messages came up: ... JMR: vdev_uberblock_load_done ub_txg=4254783 ub_timestamp=1233545538 JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538 JMR: vdev_uberblock_load_done ub_txg=4254782 ub_timestamp=1233545533 JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538 JMR: vdev_uberblock_load_done ub_txg=4254781 ub_timestamp=1233545528 JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538 JMR: vdev_uberblock_load_done ub_txg=4254780 ub_timestamp=1233545523 JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538 JMR: vdev_uberblock_load_done ub_txg=4254779 ub_timestamp=1233545518 JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538 JMR: vdev_uberblock_load_done ub_txg=4254778 ub_timestamp=1233545513 ... JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538 So, the uberblock with transaction group 4254783 was the most recent. I convinced ZFS to use an earlier one with this patch (note the second expression I added to the if statement): --- vdev_label.c.orig 2009-02-03 13:14:35.000000000 +0100 +++ vdev_label.c 2009-02-03 13:25:43.000000000 +0100 @@ -659,10 +659,12 @@ if (zio->io_error == 0 && uberblock_verify(ub) == 0) { mutex_enter(&spa->spa_uberblock_lock); - if (vdev_uberblock_compare(ub, ubbest) > 0) + printf("JMR: vdev_uberblock_load_done ub_txg=%qd ub_timestamp=%qd\n", ub->ub_txg, ub->ub_timestamp); + if (vdev_uberblock_compare(ub, ubbest) > 0 && ub->ub_txg < 4254783) *ubbest = *ub; mutex_exit(&spa->spa_uberblock_lock); } + printf("JMR: vdev_uberblock_load_done ubbest ub_txg=%qd ub_timestamp=%qd\n", ubbest->ub_txg, ubbest->ub_timestamp); zio_buf_free(zio->io_data, zio->io_size); } After compiling and loading the zfs.ko module, I executed "zpool import" and the pool was still faulted. So, I decremented the limit txg to "< 4254782" and this time the zpool came up as ONLINE. After crossing my fingers I executed "zpool import z1", and it worked ok. No data loss, everything back to normal. The only curious thing I've noticed is this: # zpool status pool: z1 state: ONLINE status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-4J scrub: resilver completed with 0 errors on Tue Feb 3 09:26:40 2009 config: NAME STATE READ WRITE CKSUM z1 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 mirror/gm0 ONLINE 0 0 0 mirror/gm1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 8076139616933977534 UNAVAIL 0 0 0 was /dev/da4 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 da7 ONLINE 0 0 0 errors: No known data errors As you can see, the raidz2 vdev is marked as ONLINE, when I think it should be DEGRADED. Nevertheless, the pool is readable and writeable, and so far I haven't detected any problem. To be safe, I am extracting all the data and I will recreate the pool again from scratch, just in case. Pending questions: 1) Why did the "supposed corruption" happened in the first place? I advise people not to mix disks from different zpools with the same name in the same computer. That's what I did, and maybe it's what caused my problems. 2) Rolling back to an earlier uberblock seems to solve some faulted zpool problems. I think it would be interesting to have a program that let you do it in a user-friendly way (after warning you about the dangers, etc.).
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?49883B45.3040606>