From owner-freebsd-fs@FreeBSD.ORG Sat Dec 27 19:59:59 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5A8371065670 for ; Sat, 27 Dec 2008 19:59:59 +0000 (UTC) (envelope-from morganw@chemikals.org) Received: from warped.bluecherry.net (unknown [IPv6:2001:440:eeee:fffb::2]) by mx1.freebsd.org (Postfix) with ESMTP id 92C998FC16 for ; Sat, 27 Dec 2008 19:59:58 +0000 (UTC) (envelope-from morganw@chemikals.org) Received: from volatile.chemikals.org (unknown [74.193.182.107]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by warped.bluecherry.net (Postfix) with ESMTPSA id 180B8A2FB13A for ; Sat, 27 Dec 2008 13:59:55 -0600 (CST) Received: from localhost (morganw@localhost [127.0.0.1]) by volatile.chemikals.org (8.14.3/8.14.3) with ESMTP id mBRJxqQE001654 for ; Sat, 27 Dec 2008 13:59:52 -0600 (CST) (envelope-from morganw@chemikals.org) Date: Sat, 27 Dec 2008 13:59:52 -0600 (CST) From: Wes Morgan To: freebsd-fs@freebsd.org In-Reply-To: Message-ID: References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Subject: zpool devices "stuck" (was zpool resilver restarting) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 Dec 2008 19:59:59 -0000 On Fri, 26 Dec 2008, Wes Morgan wrote: > On Fri, 26 Dec 2008, Wes Morgan wrote: > >> I just did a zpool replace on a new drive, and now it's resilvering. >> Only, when it gets about 20mb resilvered it restarts. I can see all the >> drive activity simply halting for a period then resuming in gstat. I see >> some bugs in the opensolaris tracker about this, but no resolutions. It >> doesn't seem to be related to calling "zpool status" because I can watch >> gstat and see it restarting... Anyone seen this before, and hopefully have >> a workaround...? >> >> The pool lost a drive on Wednesday and was running with a device missing, >> however due to the device numbering changing on the scsi bus, I had to >> export/import the pool to get it to come up, the same for after replacing >> it. > > Replying to myself with some more information. zpool history -l -i shows the > scrub loop happening: > > 2008-12-26.21:39:46 [internal pool scrub done txg:6463875] complete=0 [user > root on volatile] > 2008-12-26.21:39:46 [internal pool scrub txg:6463875] func=1 mintxg=3 > maxtxg=6463720 [user root on volatile] > 2008-12-26.21:41:23 [internal pool scrub done txg:6463879] complete=0 [user > root on volatile] > 2008-12-26.21:41:23 [internal pool scrub txg:6463879] func=1 mintxg=3 > maxtxg=6463720 [user root on volatile] > 2008-12-26.21:43:00 [internal pool scrub done txg:6463883] complete=0 [user > root on volatile] > 2008-12-26.21:43:00 [internal pool scrub txg:6463883] func=1 mintxg=3 > maxtxg=6463720 [user root on volatile] > 2008-12-26.21:44:38 [internal pool scrub done txg:6463887] complete=0 [user > root on volatile] > 2008-12-26.21:44:38 [internal pool scrub txg:6463887] func=1 mintxg=3 > maxtxg=6463720 [user root on volatile] It seems that the resliver and drive replacement were "fighting" each other somehow. Detaching the new drive allowed the resilver to complete, but now I'm stuck with two nonexistent devices trying to replace each other, and I can't replace a device that is being replaced: replacing UNAVAIL 0 36.4K 0 insufficient replicas 17628927049345412941 FAULTED 0 0 0 was /dev/da4 5474360425105728553 FAULTED 0 0 0 was /dev/da4 errors: No known data errors So, how the heck do I cancel that replacement and restart it using /dev/da4?