Date: Mon, 17 May 2010 09:37:21 +0200 From: Mark Stapper <stark@mapper.nl> To: Todd Wasson <tsw5@duke.edu> Cc: freebsd-fs@freebsd.org Subject: Re: zfs drive replacement issues Message-ID: <4BF0F231.9000706@mapper.nl> In-Reply-To: <0B97967D-1057-4414-BBD4-4F1AA2659A5D@duke.edu> References: <0B97967D-1057-4414-BBD4-4F1AA2659A5D@duke.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigA5740296903094CE5EABB9DB Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 16/05/2010 20:26, Todd Wasson wrote: > Hi everyone, I've run into some problems replacing a problematic drive = in my pool, and am hopeful someone out there can shed some light on thing= s for me, since reading previous threads and posts around the net hasn't = helped me so far. The story goes like this: for a couple of years now (s= ince 7.0-RC something) I've had a pool of four devices: two 400GB drives = and two 400GB slices from 500GB drives. I've recently seen errors with o= ne of the 400GB drives like this: > > May 11 21:33:08 newmonkey kernel: ad6: TIMEOUT - READ_DMA retrying (1 r= etry left) LBA=3D29369344 > May 11 21:33:15 newmonkey kernel: ad6: TIMEOUT - READ_DMA retrying (1 r= etry left) LBA=3D58819968 > May 11 21:33:23 newmonkey kernel: ad6: TIMEOUT - READ_DMA retrying (1 r= etry left) LBA=3D80378624 > May 11 21:34:01 newmonkey root: ZFS: vdev I/O failure, zpool=3Dtank pat= h=3D/dev/ad6 offset=3D262144 size=3D8192 error=3D6 > May 11 21:34:01 newmonkey kernel: ad6: FAILURE - device detached > > ...which also led to a bunch of IO errors showing up for that device in= "zpool status" and prompted me to replace that drive. Since finding a 4= 00GB drive was a pain, I decided to replace it with at 400GB slice from a= new 500GB drive. This is when I made what I think was the first critica= l mistake: I forgot to "zpool offline" it before doing the replacement, s= o I just exported the pool, physically replaced the drive, made a 400GB s= lice on it with fdisk, and, noticing that it now referred to the old devi= ce by an ID number instead of its "ad6" identifier, did a "zpool replace = tank 10022540361666252397 /dev/ad6s1". > > This actually prompted a scrub for some reason, and not a resilver. I'= m not sure why. However, I noticed that during a scrub I was seeing a lo= t of IO errors in "zpool status" on the new device (making me suspect tha= t maybe the old drive wasn't bad after all, but I think I'll sort that ou= t afterwards). Additionally, the device won't resilver, and now it's stu= ck in a constant state of "replacing". When I try to "zpool detach" or "= zpool offline" either device (old or new) it says there isn't a replica a= nd refuses. I've finally resorted to putting the original drive back in = to try and make some progress, and now this is what my zpool status looks= like: > > pool: tank > state: DEGRADED > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > tank DEGRADED 0 0 0 > raidz1 DEGRADED 0 0 0 > ad8 ONLINE 0 0 0 > ad10s1 ONLINE 0 0 0 > ad12s1 ONLINE 0 0 0 > replacing DEGRADED 0 0 8 > ad6 ONLINE 0 0 0 > 1667724779240260276 UNAVAIL 0 204 0 was /dev= /ad6s1 > > When I do "zpool detach tank 1667724779240260276" it says "cannot detac= h 1667724779240260276: no valid replicas". It says the same thing for a = "zpool offline tank 1667724779240260276". Note the IO errors in the new = drive (which is now disconnected), which was ad6s1. It could be a bad co= ntroller, a bad cable, or any number of things, but I can't actually test= it because I can't get rid of the device from the zfs pool. > > So, does anyone have any suggestions? Can I cancel the "replacing" ope= ration somehow? Do I have to buy a new device, back up the whole pool, d= elete it, and rebuild it? Any help is greatly appreciated! > > Thanks! > > > Todd_______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > =20 Hello, You could try exporting and importing the pool with three disks. Then make sure the "new" drive isn't part of any zpool (low-level format?= ). Then try a "replace" again. Have fun! --------------enigA5740296903094CE5EABB9DB Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkvw8jUACgkQN9xNqOOVnWAhcgCeP7JLQNWTFp97vBPJh29FUuQM b0AAnjWQYqTIIf4bBz8MuYiLK1EiJp2y =zOQN -----END PGP SIGNATURE----- --------------enigA5740296903094CE5EABB9DB--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4BF0F231.9000706>