From owner-freebsd-fs@FreeBSD.ORG Tue Feb 3 04:03:04 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EC32D1065673 for ; Tue, 3 Feb 2009 04:03:04 +0000 (UTC) (envelope-from morganw@chemikals.org) Received: from warped.bluecherry.net (unknown [IPv6:2001:440:eeee:fffb::2]) by mx1.freebsd.org (Postfix) with ESMTP id 2EC038FC08 for ; Tue, 3 Feb 2009 04:03:04 +0000 (UTC) (envelope-from morganw@chemikals.org) Received: from volatile.chemikals.org (morganw-1-pt.tunnel.tserv8.dal1.ipv6.he.net [IPv6:2001:470:1f0e:47e::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by warped.bluecherry.net (Postfix) with ESMTPSA id 8764FA1FD20A; Mon, 2 Feb 2009 22:03:00 -0600 (CST) Received: from localhost (morganw@localhost [127.0.0.1]) by volatile.chemikals.org (8.14.3/8.14.3) with ESMTP id n1342rCb013620; Mon, 2 Feb 2009 22:02:54 -0600 (CST) (envelope-from morganw@chemikals.org) Date: Mon, 2 Feb 2009 22:02:53 -0600 (CST) From: Wes Morgan To: =?ISO-8859-15?Q?Javier_Mart=EDn_Rueda?= In-Reply-To: <49879C62.6070509@diatel.upm.es> Message-ID: References: <49879C62.6070509@diatel.upm.es> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="3265970123-156712375-1233633774=:10729" Cc: freebsd-fs@freebsd.org Subject: Re: Raidz2 pool with single disk failure is faulted X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Feb 2009 04:03:05 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --3265970123-156712375-1233633774=:10729 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8BIT On Tue, 3 Feb 2009, Javier Martín Rueda wrote: > On a FreeBSD 7.1-PRERELEASE amd64 system I had a raidz2 pool made up of 8 > disks. Due to some things I tried in the past, the pool was currently like > this: > > z1 ONLINE > raidz2 ONLINE > mirror/gm0 ONLINE > mirror/gm1 ONLINE > da2 ONLINE > da3 ONLINE > da4 ONLINE > da5 ONLINE > da6 ONLINE > da7 ONLINE > > da2 to da7 where originally mirror/gm2 to mirror/gm7, but I replaced them > little by little, eliminating the corresponding gmirrors at the same time. I > don't think this is relevant for what I'm goint to explain, but I mention it > just in case... > > One day, after a system reboot, one of the disks (da4) was dead and FreeBSD > renamed all of the other disks that used to be after it (da5 became da4, da6 > became da5, and da7 became da6). The pool was unavailable (da4 to da6 marked > as corrupt and da7 as unavailable) because I suppose ZFS couldn't match the > contents in the last 3 disks to their new names. I was able to fix this by > inserting a blank new disk, rebooting, now the disk names were correct again, > and the pool showed up as degraded because da4 was unavailable, but usable. I > resilvered the pool and everything was back to normal. > > Yesterday, another disk died after a system reboot and the pool was > unavailable again because of the automatic renaming of the SCSI disks. > However, this time I didn't substitute it by a blank disk, but for another > identical disk which I had been using in the past in a different ZFS pool on > a different computer, but with the same name (z1) and same characteristics > (raidz2, 8 disks). The disk hadn't been erased and its pool hadn't been > destroyed, so it still had whatever ZFS stored in it. > > After rebooting, it seems ZFS got confused or something when it found out > about two different active pools with the same name, etc. and it faulted the > pool. I stopped ZFS, wiped the beginning and end of the disk with zeroes, but > the problem persisted. Finally, I tried to export and import the pool, as I > read somewhere that may help, but zpool import complains about an I/O error > (which I imagine is ficticious, because all of the disks are find, I can read > from them with dd no problem). > > The current situation is this: > > # zpool import > pool: z1 > id: 8828203687312199578 > state: FAULTED > status: One or more devices contains corrupted data. > action: The pool cannot be imported due to damaged devices or data. > The pool may be active on on another system, but can be imported using > the '-f' flag. > see: http://www.sun.com/msg/ZFS-8000-5E > config: > > z1 FAULTED corrupted data > raidz2 ONLINE > mirror/gm0 ONLINE > mirror/gm1 ONLINE > da2 ONLINE > da3 ONLINE > da4 UNAVAIL corrupted data > da5 ONLINE > da6 ONLINE > da7 ONLINE > # zpool import -f z1 > cannot import 'z1': I/O error > > By the way, before exporting the pool, the CKSUM column in "zpool status" > showed 6 errors. However, zpool status -v didn't give any additional > information. > > How come the pool is faulted if it is raidz2 and 7 out of 8 disks are > reported as fine? Any idea how to recover the pool? The data has to be in > there, as I haven't done any other destructive operation, as far as I can > think of, and I imagine it should be some stupid little detail. > > I have dumped all of the labels in the 8 disks with zdb -l, and I don't see > anything peculiar. They are fine in the 7 online disks, and it doesn't exist > in the da4 disk. > > Is there some kind of diagnostic tools similar to dumpfs, but for zfs? > > I can provide additional information if needed. I would try removing /boot/zfs/zpool.cache and re-importing, and if that doesn't work detach da4 device (camcontrol stop da4 or so) and see if it will import. Also make sure you wiped at least 512k from the front of the drive. --3265970123-156712375-1233633774=:10729--