From owner-freebsd-fs@FreeBSD.ORG Sun Feb 8 08:44:55 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5887B106566B for ; Sun, 8 Feb 2009 08:44:55 +0000 (UTC) (envelope-from dan.cojocar@gmail.com) Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id 15ADA8FC17 for ; Sun, 8 Feb 2009 08:44:54 +0000 (UTC) (envelope-from dan.cojocar@gmail.com) Received: by an-out-0708.google.com with SMTP id b38so737283ana.13 for ; Sun, 08 Feb 2009 00:44:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=ah//p8ueV5ADLyPhU5zw6CX9wEFjYa4gV4btqAe6NWo=; b=oC668MrD19cw+wi0XPrgBQbvmKoJ1PkzgCDWI8slYT1Dk6DBbiMKdnX3cjkZubNElg ZYihWx63jww69BDLUe1hYZtue8pC7gA8Khi5/qVtJSpw5lomoap9j08RKELqd4fRnCmo GRNf8xuQkxiuQDAgCq4aYuA3ywql1Nil61M9A= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=Eo/vqE67v2PuEv7HXczIzyDiSuJffjlarLlZU5EA//R+dyNM2xWTHvwKlGRpDVma2a C0AzG0xJiU+RsNBNZccYGMpRBqLC6AYOIu12zUCfrQDVorAnOMNFW3RGb9hE4vNYMdl2 7pT5zJmoCx/lue2oXZ5JC6huFTQQENmEjzUjQ= MIME-Version: 1.0 Received: by 10.100.125.9 with SMTP id x9mr2146376anc.65.1234082694381; Sun, 08 Feb 2009 00:44:54 -0800 (PST) In-Reply-To: References: Date: Sun, 8 Feb 2009 10:44:54 +0200 Message-ID: From: Dan Cojocar To: Wesley Morgan Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: zfs replace disk has failed X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Feb 2009 08:44:55 -0000 On Sun, Feb 8, 2009 at 12:04 AM, Wesley Morgan wrote: > On Tue, 3 Feb 2009, Dan Cojocar wrote: > >> Hello all, >> In a mirror(ad1,ad2) configuration one of my disk(ad1) had failed, >> after replacing the failed disk with a new one using: >> zpool replace tank ad1 >> I have noticed that the replace is taking too long and that the system >> is not responding, after restart the new disk was not recognized any >> more in bios :(, I have tested also in another box and the disk was >> not recognized there too. >> I have installed a new one on the same location (ad1 I think). Then >> the zpool status has reported something like this (this is from memory >> because I have made many changes back then, I don't remember exactly >> if the online disk was ad1 or ad2): >> >> zpool status >> pool: tank >> state: DEGRADED >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> tank DEGRADED 0 0 0 >> mirror DEGRADED 0 0 0 >> replacing UNAVAIL 0 387 0 >> insufficient replicas >> 10193841952954445329 REMOVED 0 0 0 was >> /dev/ad1/old >> 9318348042598806923 FAULTED 0 0 0 was /dev/ad1 >> ad2 ONLINE 0 0 0 >> At this stage I was thinking that if I will attach the new disk (ad1) >> to the mirror I will get sufficient replicas to detach >> 9318348042598806923 (this one was the disk that has failed the second >> time), so I did an attach, after the resilvering process has completed >> with success, I had: >> zpool status >> pool: tank >> state: DEGRADED >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> tank DEGRADED 0 0 0 >> mirror DEGRADED 0 0 0 >> replacing UNAVAIL 0 387 0 >> insufficient replicas >> 10193841952954445329 REMOVED 0 0 0 was >> /dev/ad1/old >> 9318348042598806923 FAULTED 0 0 0 was /dev/ad1 >> ad2 ONLINE 0 0 0 >> ad1 ONLINE 0 0 0 >> And I'm not able to detach 9318348042598806923 :(, and another bad >> news is that if I try to access something under /tank the operation is >> hanging, eg: if I do a ls /tank is freezing and if I do in another >> console: zpool status which was working before ls, now it's freezing >> too. >> What should I do next? >> Thanks, >> Dan > > ZFS seems to fall over on itself if a disk replacement is interrupted and > the replacement drive goes missing. > > By attaching the disk, you now have a 3-way mirror. The two possibilties for > you would be to roll the array back to a previous txg, which I'm not at all > sure would work, or to create a fake device the same size as the array > devices and put a label on it that emulates the missing device, and you can > then cancel the replacement. Once the replacement is cancelled, you should > be able to remove the nonexistent device. Note, that the labels are all > checksummed with sha256 so it's not a simple hex edit (unless you can > calculate checksums by hand also!). > > If you send me the first 512k of either ad1 or ad2 (off-list of course), I > can alter the labels to be the missing guids, and you can use md devices and > sparse files to fool zpool. > Hello Wesley, This was a production server so I had to restore the mirror from the backup. Can you explain a bit how can someone alter the labels of a disk in a pool? Thanks, Dan