From owner-freebsd-fs@FreeBSD.ORG  Tue Feb  3 13:36:39 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 142271065675
	for <freebsd-fs@freebsd.org>; Tue,  3 Feb 2009 13:36:39 +0000 (UTC)
	(envelope-from morganw@chemikals.org)
Received: from cdptpa-omtalb.mail.rr.com (cdptpa-omtalb.mail.rr.com
	[75.180.132.121])
	by mx1.freebsd.org (Postfix) with ESMTP id C6A908FC20
	for <freebsd-fs@freebsd.org>; Tue,  3 Feb 2009 13:36:38 +0000 (UTC)
	(envelope-from morganw@chemikals.org)
Received: from shop.chemikals.org ([75.182.5.141])
	by cdptpa-omta06.mail.rr.com with ESMTP id
	<20090203130753.DRRB22988.cdptpa-omta06.mail.rr.com@shop.chemikals.org>;
	Tue, 3 Feb 2009 13:07:53 +0000
Received: from localhost (morganw@localhost [127.0.0.1])
	by shop.chemikals.org (8.14.3/8.14.3) with ESMTP id n13D7qEG076456;
	Tue, 3 Feb 2009 08:07:53 -0500 (EST)
	(envelope-from morganw@chemikals.org)
Date: Tue, 3 Feb 2009 08:07:47 -0500 (EST)
From: Wesley Morgan <morganw@chemikals.org>
To: =?ISO-8859-15?Q?Javier_Mart=EDn_Rueda?= <jmrueda@diatel.upm.es>
In-Reply-To: <49883B45.3040606@diatel.upm.es>
Message-ID: <alpine.BSF.2.00.0902030805100.61897@fubc.purzvxnyf.bet>
References: <49879C62.6070509@diatel.upm.es>
	<alpine.BSF.2.00.0902022155190.10729@ibyngvyr.purzvxnyf.bet>
	<4987ED81.6080008@diatel.upm.es> <49883B45.3040606@diatel.upm.es>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="2866877063-78565638-1233666473=:61897"
Cc: freebsd-fs@freebsd.org
Subject: Re: Raidz2 pool with single disk failure is faulted
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 Feb 2009 13:36:39 -0000

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--2866877063-78565638-1233666473=:61897
Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8BIT

On Tue, 3 Feb 2009, Javier Martín Rueda wrote:

> I solved the problem. This is how I did it, in case one day it prevents 
> somebody from jumping in front of a train :-)
>
> First of all, I got some insight from various sites, mailing list archives, 
> documents, etc. Among them, maybe these two were more helpful:
>
> http://mail.opensolaris.org/pipermail/zfs-discuss/2008-October/051643.html
> http://opensolaris.org/os/community/zfs/docs/ondiskformat0822.pdf
>
> I suspected that maybe my uberblock was somehow corrupted, and thought it 
> would be worthwhile to rollback to an earlier uberblock. However, my pool was 
> raidz2 and the examples I had seen about how to do this were with simple 
> pools, so I tried a different approach, which in the end proved very 
> successful:
>
> First, I added a couple of printf to vdev_uberblock_load_done(), which is in 
> /sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c:
>
> --- vdev_label.c.orig   2009-02-03 13:14:35.000000000 +0100
> +++ vdev_label.c        2009-02-03 13:14:52.000000000 +0100
> @@ -659,10 +659,12 @@
>
>       if (zio->io_error == 0 && uberblock_verify(ub) == 0) {
>               mutex_enter(&spa->spa_uberblock_lock);
> +               printf("JMR: vdev_uberblock_load_done ub_txg=%qd 
> ub_timestamp=%qd\n", ub->ub_txg, ub->ub_timestamp);
>               if (vdev_uberblock_compare(ub, ubbest) > 0)
>                       *ubbest = *ub;
>               mutex_exit(&spa->spa_uberblock_lock);
>       }
> +       printf("JMR: vdev_uberblock_load_done ubbest ub_txg=%qd 
> ub_timestamp=%qd\n", ubbest->ub_txg, ubbest->ub_timestamp);
>
>       zio_buf_free(zio->io_data, zio->io_size);
> }
>
> After compiling and loading the zfs.ko module, I executed "zpool import" and 
> these messages came up:
>
> ...
> JMR: vdev_uberblock_load_done ub_txg=4254783 ub_timestamp=1233545538
> JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538
> JMR: vdev_uberblock_load_done ub_txg=4254782 ub_timestamp=1233545533
> JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538
> JMR: vdev_uberblock_load_done ub_txg=4254781 ub_timestamp=1233545528
> JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538
> JMR: vdev_uberblock_load_done ub_txg=4254780 ub_timestamp=1233545523
> JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538
> JMR: vdev_uberblock_load_done ub_txg=4254779 ub_timestamp=1233545518
> JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538
> JMR: vdev_uberblock_load_done ub_txg=4254778 ub_timestamp=1233545513
> ...
> JMR: vdev_uberblock_load_done ubbest ub_txg=4254783 ub_timestamp=1233545538
>
> So, the uberblock with transaction group 4254783 was the most recent. I 
> convinced ZFS to use an earlier one with this patch (note the second 
> expression I added to the if statement):
>
> --- vdev_label.c.orig   2009-02-03 13:14:35.000000000 +0100
> +++ vdev_label.c        2009-02-03 13:25:43.000000000 +0100
> @@ -659,10 +659,12 @@
>
>       if (zio->io_error == 0 && uberblock_verify(ub) == 0) {
>               mutex_enter(&spa->spa_uberblock_lock);
> -               if (vdev_uberblock_compare(ub, ubbest) > 0)
> +               printf("JMR: vdev_uberblock_load_done ub_txg=%qd 
> ub_timestamp=%qd\n", ub->ub_txg, ub->ub_timestamp);
> +               if (vdev_uberblock_compare(ub, ubbest) > 0 && ub->ub_txg < 
> 4254783)
>                       *ubbest = *ub;
>               mutex_exit(&spa->spa_uberblock_lock);
>       }
> +       printf("JMR: vdev_uberblock_load_done ubbest ub_txg=%qd 
> ub_timestamp=%qd\n", ubbest->ub_txg, ubbest->ub_timestamp);
>
>       zio_buf_free(zio->io_data, zio->io_size);
> }
>
> After compiling and loading the zfs.ko module, I executed "zpool import" and 
> the pool was still faulted. So, I decremented the limit txg to "< 4254782" 
> and this time the zpool came up as ONLINE. After crossing my fingers I 
> executed "zpool import z1", and it worked ok. No data loss, everything back 
> to normal. The only curious thing I've noticed is this:
>
> # zpool status
> pool: z1
> state: ONLINE
> status: One or more devices could not be used because the label is missing or
>       invalid.  Sufficient replicas exist for the pool to continue
>       functioning in a degraded state.
> action: Replace the device using 'zpool replace'.
>  see: http://www.sun.com/msg/ZFS-8000-4J
> scrub: resilver completed with 0 errors on Tue Feb  3 09:26:40 2009
> config:
>
>       NAME                     STATE     READ WRITE CKSUM
>       z1                       ONLINE       0     0     0
>         raidz2                 ONLINE       0     0     0
>           mirror/gm0           ONLINE       0     0     0
>           mirror/gm1           ONLINE       0     0     0
>           da2                  ONLINE       0     0     0
>           da3                  ONLINE       0     0     0
>           8076139616933977534  UNAVAIL      0     0     0  was /dev/da4
>           da5                  ONLINE       0     0     0
>           da6                  ONLINE       0     0     0
>           da7                  ONLINE       0     0     0
>
> errors: No known data errors
>
> As you can see, the raidz2 vdev is marked as ONLINE, when I think it should 
> be DEGRADED. Nevertheless, the pool is readable and writeable, and so far I 
> haven't detected any problem. To be safe, I am extracting all the data and I 
> will recreate the pool again from scratch, just in case.
>
>
> Pending questions:
>
> 1) Why did the "supposed corruption" happened in the first place? I advise 
> people not to mix disks from different zpools with the same name in the same 
> computer. That's what I did, and maybe it's what caused my problems.
>
> 2) Rolling back to an earlier uberblock seems to solve some faulted zpool 
> problems. I think it would be interesting to have a program that let you do 
> it in a user-friendly way (after warning you about the dangers, etc.).
>


It would be interesting to see if the txid from all of your labels was the 
same. I would highly advise scrubbing your array.

I believe the reason that your "da4" is showing up with only a uuid is 
because zfs is now recognizing that the da4 it sees is not the correct 
one. Still very curious how you ended up in that situation. I wonder if 
you had corruption that was unknown before you removed da4.

--2866877063-78565638-1233666473=:61897--