From owner-freebsd-fs@FreeBSD.ORG  Mon May 17 07:37:29 2010
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B5BE41065670
	for <freebsd-fs@freebsd.org>; Mon, 17 May 2010 07:37:29 +0000 (UTC)
	(envelope-from stark@mapper.nl)
Received: from smtp-out1.tiscali.nl (smtp-out1.tiscali.nl [195.241.79.176])
	by mx1.freebsd.org (Postfix) with ESMTP id 456B48FC13
	for <freebsd-fs@freebsd.org>; Mon, 17 May 2010 07:37:29 +0000 (UTC)
Received: from [82.170.17.27] (helo=mapper.nl)
	by smtp-out1.tiscali.nl with esmtp (Exim)
	(envelope-from <stark@mapper.nl>)
	id 1ODutE-0004jv-7A; Mon, 17 May 2010 09:37:28 +0200
Received: from [10.58.235.50] by mapper.nl with esmtp (Exim 4.69 (FreeBSD))
	(envelope-from <stark@mapper.nl>)
	id 1ODutA-000569-Co; Mon, 17 May 2010 09:37:24 +0200
Message-ID: <4BF0F231.9000706@mapper.nl>
Date: Mon, 17 May 2010 09:37:21 +0200
From: Mark Stapper <stark@mapper.nl>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB;
	rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4
MIME-Version: 1.0
To: Todd Wasson <tsw5@duke.edu>
References: <0B97967D-1057-4414-BBD4-4F1AA2659A5D@duke.edu>
In-Reply-To: <0B97967D-1057-4414-BBD4-4F1AA2659A5D@duke.edu>
X-Enigmail-Version: 1.0.1
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enigA5740296903094CE5EABB9DB"
Cc: freebsd-fs@freebsd.org
Subject: Re: zfs drive replacement issues
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 May 2010 07:37:29 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigA5740296903094CE5EABB9DB
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On 16/05/2010 20:26, Todd Wasson wrote:
> Hi everyone, I've run into some problems replacing a problematic drive =
in my pool, and am hopeful someone out there can shed some light on thing=
s for me, since reading previous threads and posts around the net hasn't =
helped me so far.  The story goes like this: for a couple of years now (s=
ince 7.0-RC something) I've had a pool of four devices: two 400GB drives =
and two 400GB slices from 500GB drives.  I've recently seen errors with o=
ne of the 400GB drives like this:
>
> May 11 21:33:08 newmonkey kernel: ad6: TIMEOUT - READ_DMA retrying (1 r=
etry left) LBA=3D29369344
> May 11 21:33:15 newmonkey kernel: ad6: TIMEOUT - READ_DMA retrying (1 r=
etry left) LBA=3D58819968
> May 11 21:33:23 newmonkey kernel: ad6: TIMEOUT - READ_DMA retrying (1 r=
etry left) LBA=3D80378624
> May 11 21:34:01 newmonkey root: ZFS: vdev I/O failure, zpool=3Dtank pat=
h=3D/dev/ad6 offset=3D262144 size=3D8192 error=3D6
> May 11 21:34:01 newmonkey kernel: ad6: FAILURE - device detached
>
> ...which also led to a bunch of IO errors showing up for that device in=
 "zpool status" and prompted me to replace that drive.  Since finding a 4=
00GB drive was a pain, I decided to replace it with at 400GB slice from a=
 new 500GB drive.  This is when I made what I think was the first critica=
l mistake: I forgot to "zpool offline" it before doing the replacement, s=
o I just exported the pool, physically replaced the drive, made a 400GB s=
lice on it with fdisk, and, noticing that it now referred to the old devi=
ce by an ID number instead of its "ad6" identifier, did a "zpool replace =
tank 10022540361666252397 /dev/ad6s1".
>
> This actually prompted a scrub for some reason, and not a resilver.  I'=
m not sure why.  However, I noticed that during a scrub I was seeing a lo=
t of IO errors in "zpool status" on the new device (making me suspect tha=
t maybe the old drive wasn't bad after all, but I think I'll sort that ou=
t afterwards).  Additionally, the device won't resilver, and now it's stu=
ck in a constant state of "replacing".  When I try to "zpool detach" or "=
zpool offline" either device (old or new) it says there isn't a replica a=
nd refuses.  I've finally resorted to putting the original drive back in =
to try and make some progress, and now this is what my zpool status looks=
 like:
>
>   pool: tank
>  state: DEGRADED
>  scrub: none requested
> config:
>
>         NAME                       STATE     READ WRITE CKSUM
>         tank                       DEGRADED     0     0     0
>           raidz1                   DEGRADED     0     0     0
>             ad8                    ONLINE       0     0     0
>             ad10s1                 ONLINE       0     0     0
>             ad12s1                 ONLINE       0     0     0
>             replacing              DEGRADED     0     0     8
>               ad6                  ONLINE       0     0     0
>               1667724779240260276  UNAVAIL      0   204     0  was /dev=
/ad6s1
>
> When I do "zpool detach tank 1667724779240260276" it says "cannot detac=
h 1667724779240260276: no valid replicas".  It says the same thing for a =
"zpool offline tank 1667724779240260276".  Note the IO errors in the new =
drive (which is now disconnected), which was ad6s1.  It could be a bad co=
ntroller, a bad cable, or any number of things, but I can't actually test=
 it because I can't get rid of the device from the zfs pool.
>
> So, does anyone have any suggestions?  Can I cancel the "replacing" ope=
ration somehow?  Do I have to buy a new device, back up the whole pool, d=
elete it, and rebuild it?  Any help is greatly appreciated!
>
> Thanks!
>
>
> Todd_______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>  =20
Hello,
You could try exporting and importing the pool with three disks.
Then make sure the "new" drive isn't part of any zpool (low-level format?=
).
Then try a "replace" again.
Have fun!


--------------enigA5740296903094CE5EABB9DB
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvw8jUACgkQN9xNqOOVnWAhcgCeP7JLQNWTFp97vBPJh29FUuQM
b0AAnjWQYqTIIf4bBz8MuYiLK1EiJp2y
=zOQN
-----END PGP SIGNATURE-----

--------------enigA5740296903094CE5EABB9DB--