Date: Sat, 9 Jan 2010 15:35:03 -0800 From: Steven Schlansker <stevenschlansker@gmail.com> To: freebsd-fs@freebsd.org Subject: Re: ZFS: Can't repair raidz2 (Cannot replace a replacing device) Message-ID: <CA577E79-C936-4EBE-81BA-E0C2940011E2@gmail.com> In-Reply-To: <alpine.BSF.2.00.0912272247410.64051@ibyngvyr> References: <048AF210-8B9A-40EF-B970-E8794EC66B2F@gmail.com> <4B315320.5050504@quip.cz> <5da0588e0912221741r48395defnd11e34728d2b7b97@mail.gmail.com> <9CEE3EE5-2CF7-440E-B5F4-D2BD796EA55C@gmail.com> <alpine.BSF.2.00.0912240708020.1450@ibyngvyr> <5565955F-482A-4628-A528-117C58046B1F@gmail.com> <alpine.BSF.2.00.0912272247410.64051@ibyngvyr>
next in thread | previous in thread | raw e-mail | index | archive | help
On Dec 27, 2009, at 8:59 PM, Wes Morgan wrote:
> On Sun, 27 Dec 2009, Steven Schlansker wrote:
>
>>
>> On Dec 24, 2009, at 5:17 AM, Wes Morgan wrote:
>>
>>> On Wed, 23 Dec 2009, Steven Schlansker wrote:
>>>>
>>>> Why has the replacing vdev not gone away? I still can't detach -
>>>> [steven@universe:~]% sudo zpool detach universe 6170688083648327969
>>>> cannot detach 6170688083648327969: no valid replicas
>>>> even though now there actually is a valid replica (ad26)
>>>
>>> Try detaching ad26. If it lets you do that it will abort the replacement and then you just do another replacement with the real device. If it won't let you do that, you may be stuck having to do some metadata tricks.
>>>
>>
>> errors: No known data errors
>> [steven@universe:~]% sudo zpool detach universe ad26
>> cannot detach ad26: no valid replicas
>> [steven@universe:~]% sudo zpool offline -t universe ad26
>> cannot offline ad26: no valid replicas
>>
>
> I just tried to re-create this scenario with some sparse files and I was able to detach it completely (below). There is one difference, however. Your array is returning checksum errors for the ad26 device. Perhaps this is making the system think that there is no sibling device in the replacement node that has all the data, so it denies the detach. Even though logically the data will be recovered by a scrub later.. Interesting. If you can determine where the detach is failing, that will help paint the complete picture.
>
Interestingly enough, I found a solution! Somewhat roundabout, but what I did was replace a different device and let it resilver completely. Then the array looked like this:
NAME STATE READ WRITE CKSUM
universe DEGRADED 0 0 0
raidz2 DEGRADED 0 0 0
ad16 ONLINE 0 0 0
replacing DEGRADED 0 0 0
ad26 ONLINE 0 0 0
6170688083648327969 UNAVAIL 0 1.13M 0 was /dev/ad12
ad8 ONLINE 0 0 0
da0 ONLINE 0 0 0
ad10 ONLINE 0 0 0
concat/ad4ex ONLINE 0 0 0
ad24 ONLINE 0 0 0
concat/ad6ex ONLINE 0 0 0
Just for kicks, I then tried to detach -
[steven@universe:~]% sudo zpool detach universe 6170688083648327969
[steven@universe:~]% sudo zpool status
pool: universe
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
universe ONLINE 0 0 0
raidz2 ONLINE 0 0 0
ad16 ONLINE 0 0 0
ad26 ONLINE 0 0 0
ad8 ONLINE 0 0 0
da0 ONLINE 0 0 0
ad10 ONLINE 0 0 0
concat/ad4ex ONLINE 0 0 0
ad24 ONLINE 0 0 0
concat/ad6ex ONLINE 0 0 0
Ta-da! I have no idea why this helped, or how it fixed it, but if anyone has this problem
in the future try replacing a different device, letting it resilver, and then detach the original problematic devices.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA577E79-C936-4EBE-81BA-E0C2940011E2>
