FreeBSD Mail Archives

Date:      Sun, 8 Feb 2009 10:44:54 +0200
From:      Dan Cojocar <dan.cojocar@gmail.com>
To:        Wesley Morgan <morganw@chemikals.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: zfs replace disk has failed
Message-ID:  <b37cb0970902080044p1cc69287j489bbbf4d5b96c67@mail.gmail.com>
In-Reply-To: <alpine.BSF.2.00.0902071654020.2236@fubc.purzvxnyf.bet>
References:  <b37cb0970902030633i4e67c8bdg94e374e9eb824858@mail.gmail.com> <alpine.BSF.2.00.0902071654020.2236@fubc.purzvxnyf.bet>

On Sun, Feb 8, 2009 at 12:04 AM, Wesley Morgan <morganw@chemikals.org> wrote:
> On Tue, 3 Feb 2009, Dan Cojocar wrote:
>
>> Hello all,
>> In a mirror(ad1,ad2) configuration one of my disk(ad1) had failed,
>> after replacing the failed disk with a new one using:
>>  zpool replace tank ad1
>> I have noticed that the replace is taking too long and that the system
>> is not responding,  after restart the new disk was not recognized any
>> more in bios :(, I have tested also in another box and the disk was
>> not recognized there too.
>> I have installed a new one on the same location (ad1 I think). Then
>> the zpool status has reported something like this (this is from memory
>> because I have made many changes back then, I don't remember exactly
>> if the online disk was ad1 or ad2):
>>
>> zpool status
>>  pool: tank
>> state: DEGRADED
>> scrub: none requested
>> config:
>>
>>       NAME                        STATE     READ WRITE CKSUM
>>       tank                        DEGRADED     0     0     0
>>         mirror                    DEGRADED     0     0     0
>>           replacing               UNAVAIL      0   387     0
>> insufficient replicas
>>             10193841952954445329  REMOVED      0     0     0  was
>> /dev/ad1/old
>>             9318348042598806923   FAULTED      0     0     0  was /dev/ad1
>>           ad2                     ONLINE       0     0     0
>> At this stage I was thinking that if I will attach the new disk (ad1)
>> to the mirror I will get sufficient replicas to detach
>> 9318348042598806923 (this one was the disk that has failed  the second
>> time), so I did an attach, after the resilvering process has completed
>> with success, I had:
>> zpool status
>>  pool: tank
>> state: DEGRADED
>> scrub: none requested
>> config:
>>
>>       NAME                        STATE     READ WRITE CKSUM
>>       tank                        DEGRADED     0     0     0
>>         mirror                    DEGRADED     0     0     0
>>           replacing               UNAVAIL      0   387     0
>> insufficient replicas
>>             10193841952954445329  REMOVED      0     0     0  was
>> /dev/ad1/old
>>             9318348042598806923   FAULTED      0     0     0  was /dev/ad1
>>           ad2                     ONLINE       0     0     0
>>           ad1                     ONLINE       0     0     0
>> And I'm not able to detach 9318348042598806923 :(, and another bad
>> news is that if I try to access something under /tank the operation is
>> hanging, eg: if I do a ls /tank is freezing and if I do in another
>> console: zpool status which was working before ls, now it's freezing
>> too.
>> What should I do next?
>> Thanks,
>> Dan
>
> ZFS seems to fall over on itself if a disk replacement is interrupted and
> the replacement drive goes missing.
>
> By attaching the disk, you now have a 3-way mirror. The two possibilties for
> you would be to roll the array back to a previous txg, which I'm not at all
> sure would work, or to create a fake device the same size as the array
> devices and put a label on it that emulates the missing device, and you can
> then cancel the replacement. Once the replacement is cancelled, you should
> be able to remove the nonexistent device. Note, that the labels are all
> checksummed with sha256 so it's not a simple hex edit (unless you can
> calculate checksums by hand also!).
>
> If you send me the first 512k of either ad1 or ad2 (off-list of course), I
> can alter the labels to be the missing guids, and you can use md devices and
> sparse files to fool zpool.
>

Hello Wesley,
This was a production server so I had to restore the mirror from the backup.
Can you explain a bit how can  someone alter the labels of a disk in a pool?
Thanks,
Dan

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b37cb0970902080044p1cc69287j489bbbf4d5b96c67>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation