From owner-freebsd-fs@FreeBSD.ORG Mon May 17 07:37:29 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B5BE41065670 for ; Mon, 17 May 2010 07:37:29 +0000 (UTC) (envelope-from stark@mapper.nl) Received: from smtp-out1.tiscali.nl (smtp-out1.tiscali.nl [195.241.79.176]) by mx1.freebsd.org (Postfix) with ESMTP id 456B48FC13 for ; Mon, 17 May 2010 07:37:29 +0000 (UTC) Received: from [82.170.17.27] (helo=mapper.nl) by smtp-out1.tiscali.nl with esmtp (Exim) (envelope-from ) id 1ODutE-0004jv-7A; Mon, 17 May 2010 09:37:28 +0200 Received: from [10.58.235.50] by mapper.nl with esmtp (Exim 4.69 (FreeBSD)) (envelope-from ) id 1ODutA-000569-Co; Mon, 17 May 2010 09:37:24 +0200 Message-ID: <4BF0F231.9000706@mapper.nl> Date: Mon, 17 May 2010 09:37:21 +0200 From: Mark Stapper User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4 MIME-Version: 1.0 To: Todd Wasson References: <0B97967D-1057-4414-BBD4-4F1AA2659A5D@duke.edu> In-Reply-To: <0B97967D-1057-4414-BBD4-4F1AA2659A5D@duke.edu> X-Enigmail-Version: 1.0.1 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigA5740296903094CE5EABB9DB" Cc: freebsd-fs@freebsd.org Subject: Re: zfs drive replacement issues X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 May 2010 07:37:29 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigA5740296903094CE5EABB9DB Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 16/05/2010 20:26, Todd Wasson wrote: > Hi everyone, I've run into some problems replacing a problematic drive = in my pool, and am hopeful someone out there can shed some light on thing= s for me, since reading previous threads and posts around the net hasn't = helped me so far. The story goes like this: for a couple of years now (s= ince 7.0-RC something) I've had a pool of four devices: two 400GB drives = and two 400GB slices from 500GB drives. I've recently seen errors with o= ne of the 400GB drives like this: > > May 11 21:33:08 newmonkey kernel: ad6: TIMEOUT - READ_DMA retrying (1 r= etry left) LBA=3D29369344 > May 11 21:33:15 newmonkey kernel: ad6: TIMEOUT - READ_DMA retrying (1 r= etry left) LBA=3D58819968 > May 11 21:33:23 newmonkey kernel: ad6: TIMEOUT - READ_DMA retrying (1 r= etry left) LBA=3D80378624 > May 11 21:34:01 newmonkey root: ZFS: vdev I/O failure, zpool=3Dtank pat= h=3D/dev/ad6 offset=3D262144 size=3D8192 error=3D6 > May 11 21:34:01 newmonkey kernel: ad6: FAILURE - device detached > > ...which also led to a bunch of IO errors showing up for that device in= "zpool status" and prompted me to replace that drive. Since finding a 4= 00GB drive was a pain, I decided to replace it with at 400GB slice from a= new 500GB drive. This is when I made what I think was the first critica= l mistake: I forgot to "zpool offline" it before doing the replacement, s= o I just exported the pool, physically replaced the drive, made a 400GB s= lice on it with fdisk, and, noticing that it now referred to the old devi= ce by an ID number instead of its "ad6" identifier, did a "zpool replace = tank 10022540361666252397 /dev/ad6s1". > > This actually prompted a scrub for some reason, and not a resilver. I'= m not sure why. However, I noticed that during a scrub I was seeing a lo= t of IO errors in "zpool status" on the new device (making me suspect tha= t maybe the old drive wasn't bad after all, but I think I'll sort that ou= t afterwards). Additionally, the device won't resilver, and now it's stu= ck in a constant state of "replacing". When I try to "zpool detach" or "= zpool offline" either device (old or new) it says there isn't a replica a= nd refuses. I've finally resorted to putting the original drive back in = to try and make some progress, and now this is what my zpool status looks= like: > > pool: tank > state: DEGRADED > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank DEGRADED 0 0 0 > raidz1 DEGRADED 0 0 0 > ad8 ONLINE 0 0 0 > ad10s1 ONLINE 0 0 0 > ad12s1 ONLINE 0 0 0 > replacing DEGRADED 0 0 8 > ad6 ONLINE 0 0 0 > 1667724779240260276 UNAVAIL 0 204 0 was /dev= /ad6s1 > > When I do "zpool detach tank 1667724779240260276" it says "cannot detac= h 1667724779240260276: no valid replicas". It says the same thing for a = "zpool offline tank 1667724779240260276". Note the IO errors in the new = drive (which is now disconnected), which was ad6s1. It could be a bad co= ntroller, a bad cable, or any number of things, but I can't actually test= it because I can't get rid of the device from the zfs pool. > > So, does anyone have any suggestions? Can I cancel the "replacing" ope= ration somehow? Do I have to buy a new device, back up the whole pool, d= elete it, and rebuild it? Any help is greatly appreciated! > > Thanks! > > > Todd_______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > =20 Hello, You could try exporting and importing the pool with three disks. Then make sure the "new" drive isn't part of any zpool (low-level format?= ). Then try a "replace" again. Have fun! --------------enigA5740296903094CE5EABB9DB Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkvw8jUACgkQN9xNqOOVnWAhcgCeP7JLQNWTFp97vBPJh29FUuQM b0AAnjWQYqTIIf4bBz8MuYiLK1EiJp2y =zOQN -----END PGP SIGNATURE----- --------------enigA5740296903094CE5EABB9DB--