Date: Sun, 1 Jan 2012 23:51:43 -0800 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Michael DeMan <freebsd@deman.com> Cc: freebsd-fs@freebsd.org Subject: Re: zfs detach/replace Message-ID: <20120102075143.GA84030@icarus.home.lan> In-Reply-To: <D9234680-D147-42B9-9DAC-42D0802A418D@deman.com> References: <8EA721E0-977D-483C-AC06-1040B87E0AA7@deman.com> <CAHcKe7kXGFcuMJTL3UxMgfeBZ1vsVJOq8sBc0H76BLP_fUmQkQ@mail.gmail.com> <C7D4513B-ABF5-4854-8B6C-7AA47E1B72CF@deman.com> <20120101195411.GA73487@icarus.home.lan> <D9234680-D147-42B9-9DAC-42D0802A418D@deman.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jan 01, 2012 at 09:09:37PM -0800, Michael DeMan wrote: > Sounds realistic to me that the 'replace' command works after a > 'detach' is properly done. No, that's not correct either. "zpool detach" means "remove this device from the mirror" (and applies only to mirrors). Per the man page: zpool detach pool device Detaches device from a mirror. The operation is refused if there are no other valid replicas of the data. Therefore, there's no way you could do "zpool detach" followed by a "zpool replace", because once you do the "detach", there's no device shown as part of the pool for you to issue a "replace" command. You can't issue "zpool detach" on raidzX pools either. Proof: # mdconfig -a -t malloc -o reserve -s 64m -u 0 # mdconfig -a -t malloc -o reserve -s 64m -u 1 # mdconfig -a -t malloc -o reserve -s 64m -u 2 # zpool create testpool raidz1 md0 md1 md2 # zpool status testpool pool: testpool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM testpool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 md0 ONLINE 0 0 0 md1 ONLINE 0 0 0 md2 ONLINE 0 0 0 errors: No known data errors # zpool detach testpool md1 cannot detach md1: only applicable to mirror and replacing vdevs If you read the link I provided you (to my blog), you'd see that I *did not* issue "zpool detach" anywhere during the procedure. Furthermore, the "zpool offline" commands I issued prior to "zpool replace" aren't really necessary either -- I did that solely as a nicety. > In my case, I forgot to do the 'detach' - just did the 'replace' after > changing the drive (and after a reboot). From there, ZFS gets in the > state below - where it automagically created label/ada5LABEL/old and > appears to be wanting to do the rebuild onto label/ada5LABEL as if it > is part of a mirrored pair? I can't answer this. I avoid using disk labels like the plague, so there may be something going on with those that explains the behaviour you're seeing. Others will need to help here. > This pool was built on FreeBSD 8.0, with an operating system update to > 8.1 after that. Possibly I could try and update the ZFS pool version > or something if this is fixed now? Let's be specific here: the pool version that's on the disks (that is to say, the actual format of the data on the disks as shown by "zpool upgrade -v") almost certainly has nothing to do with the issue you're seeing. Anyone running ZFS on FreeBSD should be using RELENG_8 or newer. That means 8.2-STABLE as recently as possible or newer. Bugfixes for all sorts of things have committed between 8.0 and 8.1, and 8.1 and 8.2. I spent a very long time trying to track all of these changes, you can read old posts of mine on the mailing lists if you wish. These days, I just tell people to run RELENG_8 as recent as possible and be done with it. > I know for a fact, back with FreeBSD 7.x, that this same scenario > could occur. Basically on below - my old notes show there is no way > to fix the situation with ada5LABEL and ada5LABEL/old without > destroying and rebuilding the pool. Any attempts to 'detach', > 'offline' or anything else on either of those two logical entries > fails with a 'no valid replicas'. I wouldn't be surprised. You're not going to get good support here for RELENG_7. I've proven to you already that on RELENG_8 -- even RELENG_8 as of August 2010 -- works just fine with "zpool replace". I'm really not sure what else to tell you. There may be quirks/oddities pertaining to raidz2 (vs. raidz1) on FreeBSD, but I simply don't know. I know for a fact "zpool replace pool XXX" on RELENG_8 works just fine (no labels, and ahci.ko is used) by doing this procedure: 1. Yank physical (bad) disk. 2. Wait a few seconds. The kernel will report "lost device" on the console (viewable via "dmesg"). 3. "zpool status" at this point will, I believe, show the disk as "UNAVAIL" -- and only that disk. For systems which may act weird with disk numbering, you should make note of what the device string is (e.g. "ada4"). 4. Insert physical (new) disk. 5. Wait a few seconds. The kernel will report ATA IDENTIFY results (disk model, size of disk, speed/capability, etc.) on the console (viewable via "dmesg"). Take note of what the device string is that the kernel assigned to this disk. 6. Issue one of the following commands, depending on whether or not the device string in #3 and #5 is identical or not: - If identical: "zpool replace pool XXX" Example: "zpool replace pool ada2" - If different: "zpool replace pool old new" (old = what's in step 3, new = what's in step 5) Example: "zpool replace pool ada2 ada6" FreeBSD does not have autoreplace support (the framework/shims for that are not written), so there is no way the system will "magically issue the 'zpool replace' command" if you physically replace a disk. There are some weirdos ( :-) ) on the mailing lists who have tried to make this happen automatically via devd(8), but that's a separate topic which I won't partake in. I've never personally experienced the device-name-changing problem *on the fly*, but have seen it in the case a system reboot/reset is done between phases. I tend to "wire down" my AHCI ports to specific device indexes anyway. Here's what's in our /boot/loader.conf on all of our 6-port production machines (RELENG_8 using ahci.ko): # "Wire down" device names (ada[0-5]) to each individual port # on the SATA/AHCI controller. This ensures that if we reboot # with a disk missing, the device names stay the same, and stay # attached to the same SATA/AHCI controller. # http://lists.freebsd.org/pipermail/freebsd-fs/2011-March/011036.html # hint.scbus.0.at="ahcich0" hint.scbus.1.at="ahcich1" hint.scbus.2.at="ahcich2" hint.scbus.3.at="ahcich3" hint.scbus.4.at="ahcich4" hint.scbus.5.at="ahcich5" hint.ada.0.at="scbus0" hint.ada.1.at="scbus1" hint.ada.2.at="scbus2" hint.ada.3.at="scbus3" hint.ada.4.at="scbus4" hint.ada.5.at="scbus5" -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120102075143.GA84030>