From owner-freebsd-fs@FreeBSD.ORG Mon Dec 29 06:38:54 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F276A1065673 for ; Mon, 29 Dec 2008 06:38:53 +0000 (UTC) (envelope-from morganw@chemikals.org) Received: from warped.bluecherry.net (unknown [IPv6:2001:440:eeee:fffb::2]) by mx1.freebsd.org (Postfix) with ESMTP id 3334D8FC08 for ; Mon, 29 Dec 2008 06:38:53 +0000 (UTC) (envelope-from morganw@chemikals.org) Received: from volatile.chemikals.org (unknown [74.193.182.107]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by warped.bluecherry.net (Postfix) with ESMTPSA id F0989A34A0E7 for ; Mon, 29 Dec 2008 00:38:51 -0600 (CST) Received: from localhost (morganw@localhost [127.0.0.1]) by volatile.chemikals.org (8.14.3/8.14.3) with ESMTP id mBT6cndx005172 for ; Mon, 29 Dec 2008 00:38:49 -0600 (CST) (envelope-from morganw@chemikals.org) Date: Mon, 29 Dec 2008 00:38:49 -0600 (CST) From: Wes Morgan To: freebsd-fs@freebsd.org In-Reply-To: Message-ID: References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Subject: Re: zpool devices "stuck" (was zpool resilver restarting) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Dec 2008 06:38:54 -0000 On Sat, 27 Dec 2008, Wes Morgan wrote: > On Fri, 26 Dec 2008, Wes Morgan wrote: > >> On Fri, 26 Dec 2008, Wes Morgan wrote: >> >>> I just did a zpool replace on a new drive, and now it's resilvering. >>> Only, when it gets about 20mb resilvered it restarts. I can see all the >>> drive activity simply halting for a period then resuming in gstat. I see >>> some bugs in the opensolaris tracker about this, but no resolutions. It >>> doesn't seem to be related to calling "zpool status" because I can watch >>> gstat and see it restarting... Anyone seen this before, and hopefully have >>> a workaround...? >>> >>> The pool lost a drive on Wednesday and was running with a device missing, >>> however due to the device numbering changing on the scsi bus, I had to >>> export/import the pool to get it to come up, the same for after replacing >>> it. >> >> Replying to myself with some more information. zpool history -l -i shows >> the scrub loop happening: >> >> 2008-12-26.21:39:46 [internal pool scrub done txg:6463875] complete=0 [user >> root on volatile] >> 2008-12-26.21:39:46 [internal pool scrub txg:6463875] func=1 mintxg=3 >> maxtxg=6463720 [user root on volatile] >> 2008-12-26.21:41:23 [internal pool scrub done txg:6463879] complete=0 [user >> root on volatile] >> 2008-12-26.21:41:23 [internal pool scrub txg:6463879] func=1 mintxg=3 >> maxtxg=6463720 [user root on volatile] >> 2008-12-26.21:43:00 [internal pool scrub done txg:6463883] complete=0 [user >> root on volatile] >> 2008-12-26.21:43:00 [internal pool scrub txg:6463883] func=1 mintxg=3 >> maxtxg=6463720 [user root on volatile] >> 2008-12-26.21:44:38 [internal pool scrub done txg:6463887] complete=0 [user >> root on volatile] >> 2008-12-26.21:44:38 [internal pool scrub txg:6463887] func=1 mintxg=3 >> maxtxg=6463720 [user root on volatile] > > > It seems that the resliver and drive replacement were "fighting" each other > somehow. Detaching the new drive allowed the resilver to complete, but now > I'm stuck with two nonexistent devices trying to replace each other, and I > can't replace a device that is being replaced: > > replacing UNAVAIL 0 36.4K 0 insufficient > replicas > 17628927049345412941 FAULTED 0 0 0 was /dev/da4 > 5474360425105728553 FAULTED 0 0 0 was /dev/da4 > > errors: No known data errors > > So, how the heck do I cancel that replacement and restart it using /dev/da4? Ok, dear sweet mercy, I think I've dug myself out of the huge hole. I found a bug in the opensolaris tracker that is basically the same as my issue: http://bugs.opensolaris.org/view_bug.do?bug_id=6782540 So, I spent most of the weekend trying to figure out how to repair the damage. I ended up re-creating the actual zfs disk label for the 547xxx device and dumping that onto the drive. After some trouble with checksums, the system came back to life a few hours ago and I thought I was out of the woods when the resilver started up. However, I was not... I had simply got myself back into the resilver loop that I could not stop. Back to the drawing board... Using gvirstor, created a 500gb volume (with only 100gb available to back it), dumped the label of the 176xxxx device onto it, export/import and then the resilver starts back up. Checking gstat showed that the true device was not being written to at all, so I realized that it was going to try to resliver the 176 device first before doing the replacement. Not good... After some more floundering, I discovered that I could "zpool detach" the virstor volume, leaving me with only real devices in the pool. Except now it did not want to do a complete and true resilver, only resilvering a tiny bit of data, about 20mb or something. My wild guess is that it might have something to do with tgx id's and how the resilver tries to only do the data that is "new". Since there is no way (that I know of) to force a resilver with zpool, I simply started scrubbing the array. This would probably have worked, but it was going to take far too long, and was simply throwing up millions of checksum errors on the new drive. So I cancelled the scrub and figured I could just offline the drive and replace it with itself... Nope, no dice, it was reported as "busy". However, after mucking around with the label some more, I was able to finally get the drive to replace itself and start resilvering. Hopefully it will finish successfully. I'm still not sure what went wrong. Part of what happened seems to be related to scsi devices not being wired down like atapi devices, so successive reboots replaced "offline" devices with "faulted", and the pool kept trying to write to them, just generating more errors. Do the folks on the opensolaris zfs-discuss take reports from FreeBSD users, or do they just toss it back at you? I did actually boot an opensolaris live cd at one point, but it couldn't match the vdevs with devices well enough to import the pool. I don't think it would have handled it properly anyway, given the bug I found in their database. Hope no one ever has to deal with this themselves! Whew...