Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 1 Jan 2012 23:51:43 -0800
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Michael DeMan <freebsd@deman.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: zfs detach/replace
Message-ID:  <20120102075143.GA84030@icarus.home.lan>
In-Reply-To: <D9234680-D147-42B9-9DAC-42D0802A418D@deman.com>
References:  <8EA721E0-977D-483C-AC06-1040B87E0AA7@deman.com> <CAHcKe7kXGFcuMJTL3UxMgfeBZ1vsVJOq8sBc0H76BLP_fUmQkQ@mail.gmail.com> <C7D4513B-ABF5-4854-8B6C-7AA47E1B72CF@deman.com> <20120101195411.GA73487@icarus.home.lan> <D9234680-D147-42B9-9DAC-42D0802A418D@deman.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jan 01, 2012 at 09:09:37PM -0800, Michael DeMan wrote:
> Sounds realistic to me that the 'replace' command works after a
> 'detach' is properly done.

No, that's not correct either.  "zpool detach" means "remove this device
from the mirror" (and applies only to mirrors).  Per the man page:

   zpool detach pool device

     Detaches  device  from  a mirror. The operation is refused if there
     are no other valid replicas of the data.

Therefore, there's no way you could do "zpool detach" followed by a
"zpool replace", because once you do the "detach", there's no device
shown as part of the pool for you to issue a "replace" command.

You can't issue "zpool detach" on raidzX pools either.  Proof:

# mdconfig -a -t malloc -o reserve -s 64m -u 0
# mdconfig -a -t malloc -o reserve -s 64m -u 1
# mdconfig -a -t malloc -o reserve -s 64m -u 2
# zpool create testpool raidz1 md0 md1 md2
# zpool status testpool
  pool: testpool
 state: ONLINE
 scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        testpool    ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            md0     ONLINE       0     0     0
            md1     ONLINE       0     0     0
            md2     ONLINE       0     0     0

errors: No known data errors

# zpool detach testpool md1
cannot detach md1: only applicable to mirror and replacing vdevs

If you read the link I provided you (to my blog), you'd see that I *did
not* issue "zpool detach" anywhere during the procedure.  Furthermore,
the "zpool offline" commands I issued prior to "zpool replace" aren't
really necessary either -- I did that solely as a nicety.

> In my case, I forgot to do the 'detach' - just did the 'replace' after
> changing the drive (and after a reboot).  From there, ZFS gets in the
> state below - where it automagically created label/ada5LABEL/old and
> appears to be wanting to do the rebuild onto label/ada5LABEL as if it
> is part of a mirrored pair?

I can't answer this.  I avoid using disk labels like the plague, so
there may be something going on with those that explains the behaviour
you're seeing.  Others will need to help here.

> This pool was built on FreeBSD 8.0, with an operating system update to
> 8.1 after that.  Possibly I could try and update the ZFS pool version
> or something if this is fixed now?  

Let's be specific here: the pool version that's on the disks (that is to
say, the actual format of the data on the disks as shown by "zpool
upgrade -v") almost certainly has nothing to do with the issue you're
seeing.

Anyone running ZFS on FreeBSD should be using RELENG_8 or newer.  That
means 8.2-STABLE as recently as possible or newer.  Bugfixes for all
sorts of things have committed between 8.0 and 8.1, and 8.1 and 8.2.  I
spent a very long time trying to track all of these changes, you can
read old posts of mine on the mailing lists if you wish.  These days, I
just tell people to run RELENG_8 as recent as possible and be done with
it.

> I know for a fact, back with FreeBSD 7.x, that this same scenario
> could occur.  Basically on below - my old notes show there is no way
> to fix the situation with ada5LABEL and ada5LABEL/old without
> destroying and rebuilding the pool.  Any attempts to 'detach',
> 'offline' or anything else on either of those two logical entries
> fails with a 'no valid replicas'.

I wouldn't be surprised.  You're not going to get good support here for
RELENG_7.  I've proven to you already that on RELENG_8 -- even RELENG_8
as of August 2010 -- works just fine with "zpool replace".

I'm really not sure what else to tell you.  There may be quirks/oddities
pertaining to raidz2 (vs. raidz1) on FreeBSD, but I simply don't know.
I know for a fact "zpool replace pool XXX" on RELENG_8 works just fine
(no labels, and ahci.ko is used) by doing this procedure:

1. Yank physical (bad) disk.
2. Wait a few seconds.  The kernel will report "lost device" on the
   console (viewable via "dmesg").
3. "zpool status" at this point will, I believe, show the disk as
   "UNAVAIL" -- and only that disk.  For systems which may act weird
   with disk numbering, you should make note of what the device string
   is (e.g. "ada4").
4. Insert physical (new) disk.
5. Wait a few seconds.  The kernel will report ATA IDENTIFY results
   (disk model, size of disk, speed/capability, etc.) on the console
   (viewable via "dmesg").  Take note of what the device string is
   that the kernel assigned to this disk.
6. Issue one of the following commands, depending on whether or not
   the device string in #3 and #5 is identical or not:
   - If identical: "zpool replace pool XXX"
     Example: "zpool replace pool ada2"
   - If different: "zpool replace pool old new"
     (old = what's in step 3, new = what's in step 5)
     Example: "zpool replace pool ada2 ada6"

FreeBSD does not have autoreplace support (the framework/shims for that
are not written), so there is no way the system will "magically issue
the 'zpool replace' command" if you physically replace a disk.  There
are some weirdos ( :-) ) on the mailing lists who have tried to make
this happen automatically via devd(8), but that's a separate topic which
I won't partake in.

I've never personally experienced the device-name-changing problem *on
the fly*, but have seen it in the case a system reboot/reset is done
between phases.

I tend to "wire down" my AHCI ports to specific device indexes anyway.
Here's what's in our /boot/loader.conf on all of our 6-port production
machines (RELENG_8 using ahci.ko):

# "Wire down" device names (ada[0-5]) to each individual port
# on the SATA/AHCI controller.  This ensures that if we reboot
# with a disk missing, the device names stay the same, and stay
# attached to the same SATA/AHCI controller.
# http://lists.freebsd.org/pipermail/freebsd-fs/2011-March/011036.html
#
hint.scbus.0.at="ahcich0"
hint.scbus.1.at="ahcich1"
hint.scbus.2.at="ahcich2"
hint.scbus.3.at="ahcich3"
hint.scbus.4.at="ahcich4"
hint.scbus.5.at="ahcich5"
hint.ada.0.at="scbus0"
hint.ada.1.at="scbus1"
hint.ada.2.at="scbus2"
hint.ada.3.at="scbus3"
hint.ada.4.at="scbus4"
hint.ada.5.at="scbus5"

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120102075143.GA84030>