Date: Sat, 7 Feb 2009 17:04:30 -0500 (EST) From: Wesley Morgan <morganw@chemikals.org> To: Dan Cojocar <dan.cojocar@gmail.com> Cc: freebsd-fs@freebsd.org Subject: Re: zfs replace disk has failed Message-ID: <alpine.BSF.2.00.0902071654020.2236@fubc.purzvxnyf.bet> In-Reply-To: <b37cb0970902030633i4e67c8bdg94e374e9eb824858@mail.gmail.com> References: <b37cb0970902030633i4e67c8bdg94e374e9eb824858@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 3 Feb 2009, Dan Cojocar wrote: > Hello all, > In a mirror(ad1,ad2) configuration one of my disk(ad1) had failed, > after replacing the failed disk with a new one using: > zpool replace tank ad1 > I have noticed that the replace is taking too long and that the system > is not responding, after restart the new disk was not recognized any > more in bios :(, I have tested also in another box and the disk was > not recognized there too. > I have installed a new one on the same location (ad1 I think). Then > the zpool status has reported something like this (this is from memory > because I have made many changes back then, I don't remember exactly > if the online disk was ad1 or ad2): > > zpool status > pool: tank > state: DEGRADED > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank DEGRADED 0 0 0 > mirror DEGRADED 0 0 0 > replacing UNAVAIL 0 387 0 > insufficient replicas > 10193841952954445329 REMOVED 0 0 0 was /dev/ad1/old > 9318348042598806923 FAULTED 0 0 0 was /dev/ad1 > ad2 ONLINE 0 0 0 > At this stage I was thinking that if I will attach the new disk (ad1) > to the mirror I will get sufficient replicas to detach > 9318348042598806923 (this one was the disk that has failed the second > time), so I did an attach, after the resilvering process has completed > with success, I had: > zpool status > pool: tank > state: DEGRADED > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank DEGRADED 0 0 0 > mirror DEGRADED 0 0 0 > replacing UNAVAIL 0 387 0 > insufficient replicas > 10193841952954445329 REMOVED 0 0 0 was /dev/ad1/old > 9318348042598806923 FAULTED 0 0 0 was /dev/ad1 > ad2 ONLINE 0 0 0 > ad1 ONLINE 0 0 0 > And I'm not able to detach 9318348042598806923 :(, and another bad > news is that if I try to access something under /tank the operation is > hanging, eg: if I do a ls /tank is freezing and if I do in another > console: zpool status which was working before ls, now it's freezing > too. > What should I do next? > Thanks, > Dan ZFS seems to fall over on itself if a disk replacement is interrupted and the replacement drive goes missing. By attaching the disk, you now have a 3-way mirror. The two possibilties for you would be to roll the array back to a previous txg, which I'm not at all sure would work, or to create a fake device the same size as the array devices and put a label on it that emulates the missing device, and you can then cancel the replacement. Once the replacement is cancelled, you should be able to remove the nonexistent device. Note, that the labels are all checksummed with sha256 so it's not a simple hex edit (unless you can calculate checksums by hand also!). If you send me the first 512k of either ad1 or ad2 (off-list of course), I can alter the labels to be the missing guids, and you can use md devices and sparse files to fool zpool. -- This .signature sanitized for your protection
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.0902071654020.2236>