Date: Sun, 8 Feb 2009 10:44:54 +0200 From: Dan Cojocar <dan.cojocar@gmail.com> To: Wesley Morgan <morganw@chemikals.org> Cc: freebsd-fs@freebsd.org Subject: Re: zfs replace disk has failed Message-ID: <b37cb0970902080044p1cc69287j489bbbf4d5b96c67@mail.gmail.com> In-Reply-To: <alpine.BSF.2.00.0902071654020.2236@fubc.purzvxnyf.bet> References: <b37cb0970902030633i4e67c8bdg94e374e9eb824858@mail.gmail.com> <alpine.BSF.2.00.0902071654020.2236@fubc.purzvxnyf.bet>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Feb 8, 2009 at 12:04 AM, Wesley Morgan <morganw@chemikals.org> wrote: > On Tue, 3 Feb 2009, Dan Cojocar wrote: > >> Hello all, >> In a mirror(ad1,ad2) configuration one of my disk(ad1) had failed, >> after replacing the failed disk with a new one using: >> zpool replace tank ad1 >> I have noticed that the replace is taking too long and that the system >> is not responding, after restart the new disk was not recognized any >> more in bios :(, I have tested also in another box and the disk was >> not recognized there too. >> I have installed a new one on the same location (ad1 I think). Then >> the zpool status has reported something like this (this is from memory >> because I have made many changes back then, I don't remember exactly >> if the online disk was ad1 or ad2): >> >> zpool status >> pool: tank >> state: DEGRADED >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> tank DEGRADED 0 0 0 >> mirror DEGRADED 0 0 0 >> replacing UNAVAIL 0 387 0 >> insufficient replicas >> 10193841952954445329 REMOVED 0 0 0 was >> /dev/ad1/old >> 9318348042598806923 FAULTED 0 0 0 was /dev/ad1 >> ad2 ONLINE 0 0 0 >> At this stage I was thinking that if I will attach the new disk (ad1) >> to the mirror I will get sufficient replicas to detach >> 9318348042598806923 (this one was the disk that has failed the second >> time), so I did an attach, after the resilvering process has completed >> with success, I had: >> zpool status >> pool: tank >> state: DEGRADED >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> tank DEGRADED 0 0 0 >> mirror DEGRADED 0 0 0 >> replacing UNAVAIL 0 387 0 >> insufficient replicas >> 10193841952954445329 REMOVED 0 0 0 was >> /dev/ad1/old >> 9318348042598806923 FAULTED 0 0 0 was /dev/ad1 >> ad2 ONLINE 0 0 0 >> ad1 ONLINE 0 0 0 >> And I'm not able to detach 9318348042598806923 :(, and another bad >> news is that if I try to access something under /tank the operation is >> hanging, eg: if I do a ls /tank is freezing and if I do in another >> console: zpool status which was working before ls, now it's freezing >> too. >> What should I do next? >> Thanks, >> Dan > > ZFS seems to fall over on itself if a disk replacement is interrupted and > the replacement drive goes missing. > > By attaching the disk, you now have a 3-way mirror. The two possibilties for > you would be to roll the array back to a previous txg, which I'm not at all > sure would work, or to create a fake device the same size as the array > devices and put a label on it that emulates the missing device, and you can > then cancel the replacement. Once the replacement is cancelled, you should > be able to remove the nonexistent device. Note, that the labels are all > checksummed with sha256 so it's not a simple hex edit (unless you can > calculate checksums by hand also!). > > If you send me the first 512k of either ad1 or ad2 (off-list of course), I > can alter the labels to be the missing guids, and you can use md devices and > sparse files to fool zpool. > Hello Wesley, This was a production server so I had to restore the mirror from the backup. Can you explain a bit how can someone alter the labels of a disk in a pool? Thanks, Dan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b37cb0970902080044p1cc69287j489bbbf4d5b96c67>