Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 6 Sep 2024 11:02:14 -0600
From:      Alan Somers <asomers@freebsd.org>
To:        Chris Ross <cross+freebsd@distal.com>
Cc:        mike tancsa <mike@sentex.net>, FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   Re: Unable to replace drive in raidz1
Message-ID:  <CAOtMX2i_zFYuOnEK_aVkpO_M8uJCvGYW%2BSzLn3OED4n5fKFoEA@mail.gmail.com>
In-Reply-To: <CB79EC2B-E793-4561-95E7-D1CEEEFC1D72@distal.com>
References:  <5ED5CB56-2E2A-4D83-8CDA-6D6A0719ED19@distal.com> <AC67D073-D476-41F5-AC53-F671430BB493@distal.com> <CAOtMX2h52d0vtceuwcDk2dzkH-fZW32inhk-dfjLMJxetVXKYg@mail.gmail.com> <CB79EC2B-E793-4561-95E7-D1CEEEFC1D72@distal.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Sep 6, 2024 at 10:51=E2=80=AFAM Chris Ross <cross+freebsd@distal.co=
m> wrote:
>
>
>
> > On Sep 6, 2024, at 11:32, Alan Somers <asomers@freebsd.org> wrote:
> >
> > "zpool replace" is indeed the correct command.  There's no need to run
> > "zpool offline" first, and "zpool remove" is wrong.  Since "zpool
> > replace" is still failing, are you sure that da10 is still the correct
> > device name after all disks got renumbered?  If you're sure, then you
> > might run "zdb -l /dev/da10" to see what ZFS thinks is on that disk.
> >
>
> I can confirm that da10 is still the new disk I put into place of prior d=
a3.
>
>
> > On Sep 6, 2024, at 11:43, mike tancsa <mike@sentex.net> wrote:
> > I would triple check to see what the devices are that are part of the p=
ool.  I wish there was a way to tell zfs to only display one or the other. =
 So list out what diskid/DISK-K1GMBN9D, diskid/DISK-K1GMEDMD... to diskid/D=
ISK-3WJ7ZMMJ are in terms of /dev/da* actually are.  I have some controller=
s that will re-order the disks on every reboot.  glabel status and camcontr=
ol devlist should help verify
>
>
> camcontrol devlist lets me know that the three HGST drives making up
> zraid1-1 are da3,da4,da5 and the three WD drives making up
> zraid1-2 are da6,da7,da8.  So, like before, just moved down a
> number because the prior da3 went away and a new disk in that
> physical slot became da10.  (da9 is a loose JBOD single with ufs
> on it, previously da10, in slot 12 of 12)
>
> da10 is in fact still the disk in slot3 of the chassis, zdb -l shows
> the below.  I did add and remove it as a spare while trying things,
> that may be why it shows up this way.
>
>              - Chris
>
> % sudo zdb -l /dev/da10
> ------------------------------------
> LABEL 0
> ------------------------------------
>     version: 5000
>     name: 'tank'
>     state: 0
>     txg: 0
>     pool_guid: 3456317866677065800
>     errata: 0
>     hostid: 2747523522
>     hostname: 'frizzen02.devit.ciscolabs.com'
>     top_guid: 2495145666029787532
>     guid: 2495145666029787532
>     vdev_children: 3
>     vdev_tree:
>         type: 'disk'
>         id: 0
>         guid: 2495145666029787532
>         path: '/dev/da10'
>         phys_path: 'id1,enc@n584b2612f2c321bd/type@0/slot@3/elmdesc@Array=
Device03'
>         whole_disk: 1
>         metaslab_array: 0
>         metaslab_shift: 0
>         ashift: 12
>         asize: 22000965255168
>         is_log: 0
>         create_txg: 18008413
>     features_for_read:
>         com.delphix:hole_birth
>         com.delphix:embedded_data
>     create_txg: 18008413
>     labels =3D 0 1 2 3

This looks like you got into a split-brain situation where the disks
have inconsistent labels.  Most disks think that da10 is not a member
of the pool, but da10 thinks that it is.  Perhaps you added it as a
spare, then physically removed it, and then did a "zpool remove" to
remove the spare from the configuration?  If you're very very very
sure that there is no data on da10 that you care about, you can do
"zpool labelclear -f /dev/da10"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2i_zFYuOnEK_aVkpO_M8uJCvGYW%2BSzLn3OED4n5fKFoEA>