From owner-freebsd-fs@FreeBSD.ORG Mon Dec 28 04:27:39 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1E8CD1065694 for ; Mon, 28 Dec 2009 04:27:39 +0000 (UTC) (envelope-from morganw@chemikals.org) Received: from warped.bluecherry.net (unknown [IPv6:2001:440:eeee:fffb::2]) by mx1.freebsd.org (Postfix) with ESMTP id 90D338FC08 for ; Mon, 28 Dec 2009 04:27:38 +0000 (UTC) Received: from volatile.chemikals.org (adsl-67-124-163.shv.bellsouth.net [98.67.124.163]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by warped.bluecherry.net (Postfix) with ESMTPSA id 0DFA1A1ED712; Sun, 27 Dec 2009 22:27:36 -0600 (CST) Received: from localhost (morganw@localhost [127.0.0.1]) by volatile.chemikals.org (8.14.3/8.14.3) with ESMTP id nBS4RXqY004745; Sun, 27 Dec 2009 22:27:33 -0600 (CST) (envelope-from morganw@chemikals.org) Date: Sun, 27 Dec 2009 22:27:33 -0600 (CST) From: Wes Morgan X-X-Sender: morganw@volatile To: Steven Schlansker In-Reply-To: <5565955F-482A-4628-A528-117C58046B1F@gmail.com> Message-ID: References: <048AF210-8B9A-40EF-B970-E8794EC66B2F@gmail.com> <4B315320.5050504@quip.cz> <5da0588e0912221741r48395defnd11e34728d2b7b97@mail.gmail.com> <9CEE3EE5-2CF7-440E-B5F4-D2BD796EA55C@gmail.com> <5565955F-482A-4628-A528-117C58046B1F@gmail.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Virus-Scanned: clamav-milter 0.95.2 at warped X-Virus-Status: Clean Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: Can't repair raidz2 (Cannot replace a replacing device) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Dec 2009 04:27:39 -0000 On Sun, 27 Dec 2009, Steven Schlansker wrote: > > On Dec 24, 2009, at 5:17 AM, Wes Morgan wrote: > >> On Wed, 23 Dec 2009, Steven Schlansker wrote: >>> >>> Why has the replacing vdev not gone away? I still can't detach - >>> [steven@universe:~]% sudo zpool detach universe 6170688083648327969 >>> cannot detach 6170688083648327969: no valid replicas >>> even though now there actually is a valid replica (ad26) >> >> Try detaching ad26. If it lets you do that it will abort the replacement and then you just do another replacement with the real device. If it won't let you do that, you may be stuck having to do some metadata tricks. >> > > Sadly, no go: > > pool: universe > state: DEGRADED > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > universe DEGRADED 0 0 0 > raidz2 DEGRADED 0 0 0 > ad16 ONLINE 0 0 0 > replacing DEGRADED 0 0 5.04K > ad26 ONLINE 0 0 0 > 6170688083648327969 UNAVAIL 0 1.08M 0 was /dev/ad12 > ad8 ONLINE 0 0 0 > concat/back2 ONLINE 0 0 0 > ad10 ONLINE 0 0 0 > concat/ad4ex ONLINE 0 0 0 > ad24 ONLINE 0 0 0 > concat/ad6ex ONLINE 0 0 0 > > errors: No known data errors > [steven@universe:~]% sudo zpool detach universe ad26 > cannot detach ad26: no valid replicas > [steven@universe:~]% sudo zpool offline -t universe ad26 > cannot offline ad26: no valid replicas > Hmm. Looking through the spa_vdev_detach() code in sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c it is "failing" on either line 3046 or 3072. If it's failing on 3046, the bug would appear to be that it doesn't count the missing device as a child and allow you to detach it. In that case, a "hack" might be to bypass it by changing line 3045 to: if (pvd->vdev_children == 0) If the failure is on 3072, then somehow the original device is not being counted as a valid copy, so it won't allow you to detach. That check looks like it would be dangerous to bypass. Based on my experience with this failure, I'm betting the device counting is off and it's returning on line 3045. You might try inserting some debugging kernel printf's there or using kdb to step through it and see. If it is, I think bypassing 3045 might let you detach the nonexistent device. Of course, back up your data before attempting anything of the sort!!