From owner-freebsd-fs@FreeBSD.ORG  Thu Dec 24 13:17:30 2009
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 25960106566C
	for <freebsd-fs@freebsd.org>; Thu, 24 Dec 2009 13:17:30 +0000 (UTC)
	(envelope-from morganw@chemikals.org)
Received: from warped.bluecherry.net (unknown [IPv6:2001:440:eeee:fffb::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 714488FC15
	for <freebsd-fs@freebsd.org>; Thu, 24 Dec 2009 13:17:29 +0000 (UTC)
Received: from volatile.chemikals.org (adsl-67-252-59.shv.bellsouth.net
	[98.67.252.59])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by warped.bluecherry.net (Postfix) with ESMTPSA id C1422A1CAF8F;
	Thu, 24 Dec 2009 07:17:27 -0600 (CST)
Received: from localhost (morganw@localhost [127.0.0.1])
	by volatile.chemikals.org (8.14.3/8.14.3) with ESMTP id nBODHNjb042895; 
	Thu, 24 Dec 2009 07:17:24 -0600 (CST)
	(envelope-from morganw@chemikals.org)
Date: Thu, 24 Dec 2009 07:17:23 -0600 (CST)
From: Wes Morgan <morganw@chemikals.org>
X-X-Sender: morganw@volatile
To: Steven Schlansker <stevenschlansker@gmail.com>
In-Reply-To: <9CEE3EE5-2CF7-440E-B5F4-D2BD796EA55C@gmail.com>
Message-ID: <alpine.BSF.2.00.0912240708020.1450@ibyngvyr>
References: <048AF210-8B9A-40EF-B970-E8794EC66B2F@gmail.com>
	<4B315320.5050504@quip.cz>
	<5da0588e0912221741r48395defnd11e34728d2b7b97@mail.gmail.com>
	<9CEE3EE5-2CF7-440E-B5F4-D2BD796EA55C@gmail.com>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Virus-Scanned: clamav-milter 0.95.2 at warped
X-Virus-Status: Clean
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: Can't repair raidz2 (Cannot replace a replacing device)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 24 Dec 2009 13:17:30 -0000

On Wed, 23 Dec 2009, Steven Schlansker wrote:

>
> On Dec 22, 2009, at 5:41 PM, Rich wrote:
>
>> http://kerneltrap.org/mailarchive/freebsd-fs/2009/9/30/6457763 may be
>> useful to you - it's what we did when we got stuck in a resilver loop.
>> I recall being in the same state you're in right now at one point, and
>> getting out of it from there.
>>
>> I think if you apply that patch, you'll be able to cancel the
>> resilver, and then resilver again with the device you'd like to
>> resilver with.
>>
>
> Thanks for the suggestion, but the problem isn't that it's stuck
> in a resilver loop (which is what the patch seems to try to avoid)
> but that I can't detach a drive.
>
> Now I got clever and fudged a label onto the new drive (copied the first
> 50MB of one of the dying drives), ran a scrub, and have this layout -
>
>  pool: universe
> state: DEGRADED
> status: One or more devices has experienced an unrecoverable error.  An
>        attempt was made to correct the error.  Applications are unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>        using 'zpool clear' or replace the device with 'zpool replace'.
>   see: http://www.sun.com/msg/ZFS-8000-9P
> scrub: scrub completed after 20h58m with 0 errors on Wed Dec 23 11:36:43 2009
> config:
>
>        NAME                       STATE     READ WRITE CKSUM
>        universe                   DEGRADED     0     0     0
>          raidz2                   DEGRADED     0     0     0
>            ad16                   ONLINE       0     0     0
>            replacing              DEGRADED     0     0 40.7M
>              ad26                 ONLINE       0     0     0  506G repaired
>              6170688083648327969  UNAVAIL      0 88.7M     0  was /dev/ad12
>            ad8                    ONLINE       0     0     0
>            concat/back2           ONLINE       0     0     0
>            ad10                   ONLINE       0     0     0
>            concat/ad4ex           ONLINE       0     0     0
>            ad24                   ONLINE       0     0     0
>            concat/ad6ex           ONLINE      48     0     0  28.5K repaired
>
> Why has the replacing vdev not gone away?  I still can't detach -
> [steven@universe:~]% sudo zpool detach universe 6170688083648327969
> cannot detach 6170688083648327969: no valid replicas
> even though now there actually is a valid replica (ad26)

Try detaching ad26. If it lets you do that it will abort the replacement 
and then you just do another replacement with the real device. If it won't 
let you do that, you may be stuck having to do some metadata 
tricks.


> Additionally, running zpool clear hangs permanently and in fact freezes all IO
> to the pool.  Since I've mounted /usr from the pool, this is effectively
> death to the system.  Any other zfs commands seem to work okay
> (zpool scrub, zfs mount, etc.).  Just clear is insta-death.  I can't
> help but suspect that this is caused by the now non-sensical vdev configuration
> (replacing with one good drive and one nonexistent one)...
>
> Any further thoughts?  Thanks,
> Steven
>
>
>> - Rich
>>
>> On Tue, Dec 22, 2009 at 6:15 PM, Miroslav Lachman <000.fbsd@quip.cz> wrote:
>>> Steven Schlansker wrote:
>>>>
>>>> As a corollary, you may notice some funky concat business going on.
>>>> This is because I have drives which are very slightly different in size (<
>>>>  1MB)
>>>> and whenever one of them goes down and I bring the pool up, it helpfully
>>>> (?)
>>>> expands the pool by a whole megabyte then won't let the drive back in.
>>>> This is extremely frustrating... is there any way to fix that?  I'm
>>>> eventually going to keep expanding each of my drives one megabyte at a
>>>> time
>>>> using gconcat and space on another drive!  Very frustrating...
>>>
>>> You can avoid it by partitioning the drives to the well known 'minimal' size
>>> (size of smallest disk) and use the partition instead of raw disk.
>>> For example ad12s1 instead of ad12 (if you creat slices by fdisk)
>>> of ad12p1 (if you creat partitions by gpart)
>>>
>>> You can also use labels instead of device name.
>>>
>>> Miroslav Lachman
>>> _______________________________________________
>>> freebsd-fs@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>>
>>
>>
>>
>> --
>>
>> If you are over 80 years old and accompanied by your parents, we will
>> cash your check.
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>