Date: Tue, 26 Jan 2010 08:46:19 -0800 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: freebsd-stable@freebsd.org Subject: Re: ZFS "zpool replace" problems Message-ID: <20100126164619.GA50461@icarus.home.lan> In-Reply-To: <20100126160320.6ed67b92.gerrit@pmp.uni-hannover.de> References: <20100126143021.GA47535@icarus.home.lan> <20100126160320.6ed67b92.gerrit@pmp.uni-hannover.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jan 26, 2010 at 04:03:20PM +0100, Gerrit Kühn wrote: > On Tue, 26 Jan 2010 06:30:21 -0800 Jeremy Chadwick > <freebsd@jdc.parodius.com> wrote about Re: ZFS "zpool replace" problems: > JC> 2) How did you attach ad18? Did you tell the system about it using > JC> atacontrol? If so, what commands did you use? > > Yes. The drives did not appear automatically (verified with atacontrol > list). Then I first tried reinit ata9, but that did not work out, so I did > a detach/attach for ata9, then the drive was there (with list and also > the device node appeared). The procedure -- at least on Intel controllers in AHCI mode -- is: - zpool offline <pool> <disk> - atacontrol detach ataX (where X = channel associated with disk) - Physically remove bad disk - Physically insert new disk - Wait 15 seconds for stuff to settle - atacontrol attach ataX (where X = previous channel detached) - zpool replace <pool> <disk> - zpool online <pool> <disk> "reinit" shouldn't be needed at all -- in fact, I've seen reinit cause some craziness (even on Intel controllers), including a system deadlock, but this was back during the RELENG_6 and RELENG_7 days. Great improvements have been made to ata(4) since then. If you need me to validate the above procedure (it's been a while since I've had to hot-swap a disk), I can do so. I do have a 4-disk Supermicro SuperServer 5015B-MTB (ICH9-based) sitting on my workbench which I can test with. > Meanwhile I took out the ad18 drive again and tried to use a different > drive. But that was listed as "UNAVAIL" with corrupted data by zfs. > Probably it already branded the disk for resilvering and is looking for > exactly this one now. I also put in the disk which caused the problem > above again. The resilvering process started again, but very soon the > drive got detached again resulting in the same situation I described above. It honestly sounds like hot-swapping is causing some chaos on your system. Are all of the controllers involved configured for AHCI? If not, physical removal/insertion should be done only when the system power is off. If so, mav@ or others may be able to help figure out what's going on in the underlying ata(4) layer. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100126164619.GA50461>