Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 7 Feb 2009 17:04:30 -0500 (EST)
From:      Wesley Morgan <morganw@chemikals.org>
To:        Dan Cojocar <dan.cojocar@gmail.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: zfs replace disk has failed
Message-ID:  <alpine.BSF.2.00.0902071654020.2236@fubc.purzvxnyf.bet>
In-Reply-To: <b37cb0970902030633i4e67c8bdg94e374e9eb824858@mail.gmail.com>
References:  <b37cb0970902030633i4e67c8bdg94e374e9eb824858@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 3 Feb 2009, Dan Cojocar wrote:

> Hello all,
> In a mirror(ad1,ad2) configuration one of my disk(ad1) had failed,
> after replacing the failed disk with a new one using:
>  zpool replace tank ad1
> I have noticed that the replace is taking too long and that the system
> is not responding,  after restart the new disk was not recognized any
> more in bios :(, I have tested also in another box and the disk was
> not recognized there too.
> I have installed a new one on the same location (ad1 I think). Then
> the zpool status has reported something like this (this is from memory
> because I have made many changes back then, I don't remember exactly
> if the online disk was ad1 or ad2):
>
> zpool status
>  pool: tank
> state: DEGRADED
> scrub: none requested
> config:
>
>        NAME                        STATE     READ WRITE CKSUM
>        tank                        DEGRADED     0     0     0
>          mirror                    DEGRADED     0     0     0
>            replacing               UNAVAIL      0   387     0
> insufficient replicas
>              10193841952954445329  REMOVED      0     0     0  was /dev/ad1/old
>              9318348042598806923   FAULTED      0     0     0  was /dev/ad1
>            ad2                     ONLINE       0     0     0
> At this stage I was thinking that if I will attach the new disk (ad1)
> to the mirror I will get sufficient replicas to detach
> 9318348042598806923 (this one was the disk that has failed  the second
> time), so I did an attach, after the resilvering process has completed
> with success, I had:
> zpool status
>  pool: tank
> state: DEGRADED
> scrub: none requested
> config:
>
>        NAME                        STATE     READ WRITE CKSUM
>        tank                        DEGRADED     0     0     0
>          mirror                    DEGRADED     0     0     0
>            replacing               UNAVAIL      0   387     0
> insufficient replicas
>              10193841952954445329  REMOVED      0     0     0  was /dev/ad1/old
>              9318348042598806923   FAULTED      0     0     0  was /dev/ad1
>            ad2                     ONLINE       0     0     0
>            ad1                     ONLINE       0     0     0
> And I'm not able to detach 9318348042598806923 :(, and another bad
> news is that if I try to access something under /tank the operation is
> hanging, eg: if I do a ls /tank is freezing and if I do in another
> console: zpool status which was working before ls, now it's freezing
> too.
> What should I do next?
> Thanks,
> Dan

ZFS seems to fall over on itself if a disk replacement is interrupted and 
the replacement drive goes missing.

By attaching the disk, you now have a 3-way mirror. The two possibilties 
for you would be to roll the array back to a previous txg, which I'm not 
at all sure would work, or to create a fake device the same size as the 
array devices and put a label on it that emulates the missing device, and 
you can then cancel the replacement. Once the replacement is cancelled, 
you should be able to remove the nonexistent device. Note, that the labels 
are all checksummed with sha256 so it's not a simple hex edit (unless you 
can calculate checksums by hand also!).

If you send me the first 512k of either ad1 or ad2 (off-list of course), 
I can alter the labels to be the missing guids, and you can use md 
devices and sparse files to fool zpool.

-- 
This .signature sanitized for your protection



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.0902071654020.2236>