Date: Sat, 27 Dec 2008 13:59:52 -0600 (CST) From: Wes Morgan <morganw@chemikals.org> To: freebsd-fs@freebsd.org Subject: zpool devices "stuck" (was zpool resilver restarting) Message-ID: <alpine.BSF.2.00.0812271356290.1614@ibyngvyr.purzvxnyf.bet> In-Reply-To: <alpine.BSF.2.00.0812262148490.3874@ibyngvyr.purzvxnyf.bet> References: <9461581F-F354-486D-961D-3FD5B1EF007C@rabson.org> <alpine.BSF.2.00.0812262114100.1887@ibyngvyr.purzvxnyf.bet> <alpine.BSF.2.00.0812262148490.3874@ibyngvyr.purzvxnyf.bet>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 26 Dec 2008, Wes Morgan wrote: > On Fri, 26 Dec 2008, Wes Morgan wrote: > >> I just did a zpool replace on a new drive, and now it's resilvering. >> Only, when it gets about 20mb resilvered it restarts. I can see all the >> drive activity simply halting for a period then resuming in gstat. I see >> some bugs in the opensolaris tracker about this, but no resolutions. It >> doesn't seem to be related to calling "zpool status" because I can watch >> gstat and see it restarting... Anyone seen this before, and hopefully have >> a workaround...? >> >> The pool lost a drive on Wednesday and was running with a device missing, >> however due to the device numbering changing on the scsi bus, I had to >> export/import the pool to get it to come up, the same for after replacing >> it. > > Replying to myself with some more information. zpool history -l -i shows the > scrub loop happening: > > 2008-12-26.21:39:46 [internal pool scrub done txg:6463875] complete=0 [user > root on volatile] > 2008-12-26.21:39:46 [internal pool scrub txg:6463875] func=1 mintxg=3 > maxtxg=6463720 [user root on volatile] > 2008-12-26.21:41:23 [internal pool scrub done txg:6463879] complete=0 [user > root on volatile] > 2008-12-26.21:41:23 [internal pool scrub txg:6463879] func=1 mintxg=3 > maxtxg=6463720 [user root on volatile] > 2008-12-26.21:43:00 [internal pool scrub done txg:6463883] complete=0 [user > root on volatile] > 2008-12-26.21:43:00 [internal pool scrub txg:6463883] func=1 mintxg=3 > maxtxg=6463720 [user root on volatile] > 2008-12-26.21:44:38 [internal pool scrub done txg:6463887] complete=0 [user > root on volatile] > 2008-12-26.21:44:38 [internal pool scrub txg:6463887] func=1 mintxg=3 > maxtxg=6463720 [user root on volatile] It seems that the resliver and drive replacement were "fighting" each other somehow. Detaching the new drive allowed the resilver to complete, but now I'm stuck with two nonexistent devices trying to replace each other, and I can't replace a device that is being replaced: replacing UNAVAIL 0 36.4K 0 insufficient replicas 17628927049345412941 FAULTED 0 0 0 was /dev/da4 5474360425105728553 FAULTED 0 0 0 was /dev/da4 errors: No known data errors So, how the heck do I cancel that replacement and restart it using /dev/da4?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.0812271356290.1614>